Page 1

J Comput Neurosci (2011) 30:45–67

DOI 10.1007/s10827-010-0262-3

Transfer entropy—a model-free measure of effective

connectivity for the neurosciences

Raul Vicente · Michael Wibral · Michael Lindner ·

Gordon Pipa

Received: 5 January 2010 / Revised: 17 June 2010 / Accepted: 20 July 2010 / Published online: 13 August 2010

© The Author(s) 2010. This article is published with open access at Springerlink.com

Abstract Understanding

effective connectivity, between parts of the brain is of

utmost importance because a large part of the brain’s

activity is thought to be internally generated and,

hence, quantifying stimulus response relationships

alone does not fully describe brain dynamics. Past

efforts to determine effective connectivity mostly relied

on model based approaches such as Granger causality

or dynamic causal modeling. Transfer entropy (TE) is

an alternative measure of effective connectivity based

on information theory. TE does not require a model

of the interaction and is inherently non-linear. We

investigated the applicability of TE as a metric in a test

for effective connectivity to electrophysiological data

causal relationships,or

Action Editor: Aurel A. Lazar

R. Vicente, M. Wibral, and M. Lindner contributed equally.

ML was funded by the Hessian initiative for the

development of scientific and economic excellence

(LOEWE). RV and GP were in part supported

by the Hertie Foundation and the EU (EU

project GABA—FP6-2005-NEST-Path-043309).

R. Vicente · G. Pipa

Max Planck Institute for Brain Research, Frankfurt,

Germany

G. Pipa

e-mail: pipa@mpih-frankfurt.mpg.de

R. Vicente · G. Pipa

Frankfurt Institute for Advanced Studies (FIAS),

Frankfurt, Germany

R. Vicente

e-mail: vicente@fias.uni-frankfurt.de

based on simulations and magnetoencephalography

(MEG) recordings in a simple motor task. In particular,

we demonstrate that TE improved the detectability of

effective connectivity for non-linear interactions, and

for sensor level MEG signals where linear methods

are hampered by signal-cross-talk due to volume

conduction.

Keywords Information theory · Effective

connectivity · Causality · Information transfer ·

Electroencephalography · Magnetoencephalography

1 Introduction

Science is about making predictions. To this aim sci-

entists construct a theory of causal relationships be-

tween two observations. In neuroscience, one of the

observations can often be manipulated at will, i.e. a

stimulus in an experiment, and the second observation

is measured, i.e. neuronal activity. If we can correctly

predict the behavior of the second observation we have

identified a causal relationship between stimulus and

M. Wibral (B )

MEG Unit, Brain Imaging Center,

Goethe University, Frankfurt, Germany

e-mail: michael.wibral@web.de

M. Lindner

Department of Educational Psychology, Goethe University,

Frankfurt, Germany

M. Lindner

Center for Individual Development and Adaptive Education

of Children at Risk (IDeA), Frankfurt, Germany

e-mail: m.lindner@idea-frankfurt.eu

Page 2

46 J Comput Neurosci (2011) 30:45–67

response. However, identifying causal relationships be-

tween stimuli and responses covers only part of neu-

ronal dynamics—a large part of the brain’s activity is

internally generated and contributes to the response

variability that is observed despite constant stimuli

(Arieli et al. 1996). For the case of internally generated

dynamics it is rather difficult to infer a physical causal-

ity because a deliberate manipulation of this aspect

of the system is extremely difficult. Nevertheless, we

can try to make predictions based on the concept of

causality as it was introduced by Wiener (1956). In

Wiener’s definition an improvement of the prediction

of the future of a time series X by the incorporation

of information from the past of a second time series Y

is seen as an indication of a causal interaction from Y

to X. Such causal interactions across brain structures

are also called ‘effective connectivty’ (Friston 1994)

and they are thought to reveal the information flow

associated to neuronal processing much more precisely

than functional connectivity, which only reflects the

statistical covariation of signals as typically revealed by

cross-correlograms or coherency measures. Therefore,

we must identify causal relationships between parts of

the brain, be they single cells, cortical columns, or brain

areas.

Variousmeasuresofcausalrelationships,oreffective

connectivity, exist. They can be divided into two large

classes: those that quantify effective connectivity based

on the abstract concept of information of random vari-

ables (e.g. Schreiber 2000), and those based on specific

models of the processes generating the data. Meth-

ods in the latter class are most widely used to study

effective connectivity in neuroscience, with Granger

causality (GC, Granger 1969) and dynamic causal mod-

eling (DCM, Friston et al. 2003) arguably being most

popular. In the next two paragraphs we give a short

overview over the data generation models in GC and

DCMandtheirspecificconsequencessothatthereader

can appreciate the fundamental differences between

these model based approaches and the information

theoretic approach presented below:

Standard implementations of GC use a linear sto-

chastic model for the intrinsic dynamics of the signal

and a linear interaction.1Therefore, GC is only well

applicable when three prerequisites are met: (a) The

interaction between the two units under observation

has to be well approximated by a linear description, (b)

the data have to have relatively low noise levels (see

1Historically, however, GC was formulated without explicit as-

sumptions about the linearity of the system (Granger 1969) and

was therefore closely related to Wiener’s formal definition of

causality Wiener (1956).

e.g. Nalatore et al. 2007), and (c) cross-talk between

the measurements of the two signals of interest has to

be low (Nolte et al. 2008). Frequency domain variants

of GC such as the partial directed coherence or the

directed transfer function fall in the same category

(Pereda et al. 2005).

DCM assumes a bilinear state space model (BSSM).

Thus, DCM covers non-linear interactions—at least

partially. DCM requires knowledge about the input to

the system, because this input is modeled as modu-

lating the interactions between the parts of the sys-

tem (Friston et al. 2003). DCM also requires a certain

amount of a priori knowledge about the network of

connectivities under investigation, because ultimately

DCM compares the evidence for several competing a

priori models with respect to the observed data. This a

priori knowledge on the input to the system and on the

potential connectivity may not always be available, e.g.

in studies of the resting-state. Therefore, DCM may not

be optimal for exploratory analyses.

Based on the merits and problems of the methods

described in the last paragraph we may formulate four

requirements that a new measure of effective connec-

tivity must meet to be a useful addition to already

established methods:

1.It should not require the a priori definition of the

type of interaction, so that it is useful as a tool for

exploratory investigations.

It should be able to detect frequently observed

types of purely non-linear interactions. This is be-

cause strong non-linearities are observed across

all levels of brain function, from the all-or none

mechanism of action potential generation in neu-

rons to non-linear psychometric functions, such as

the power-law relationship in Weber’s law or the

inverted-U relationship between arousal levels and

response speeds described in the Yerkes-Dodson

law (Yerkes and Dodson 1908).

It should detect effective connectivity even if there

there is a wide distribution of interaction delays

between the two signals, because signaling between

brainareasmayinvolvemultiplepathwaysortrans-

mission over various axons that connect two areas

and that vary in their conduction delays (Swadlow

and Waxman 1975; Swadlow et al. 1978).

It should be robust against linear cross-talk be-

tween signals. This is important for the analysis of

data recorded with electro- or magnetoencephalog-

raphy, that provide a large part of the available

electrophysiological data today.

2.

3.

4.

The fact that a potential new method should be as

model free as possible naturally leads to the applica-

Page 3

J Comput Neurosci (2011) 30:45–6747

tion of information theoretic techniques. Information

theory (IT) sets a powerful framework for the quan-

tification of information and communication (Shannon

1948). It is not surprising then that information the-

ory also provides an ideal basis to precisely formulate

causal hypotheses. In the next paragraph, we present

the connection between the quantification of informa-

tion and communication and Wiener’s definition of

causal interactions (Wiener 1956) in more detail be-

cause of its importance for the justification of using IT

methods in this work.

In the context of information theory, the key mea-

sure of information of a discrete2random variable is

its Shannon entropy (Shannon 1948; Reza 1994). This

entropyquantifiesthereductionofuncertaintyobtained

when one actually measures the value of the variable.

On the other hand, Wiener’s definition of causal de-

pendencies rests on an increase of prediction power. In

particular, a signal X is said to cause a signal Y when

the future of signal Y is better predicted by adding

knowledge from the past and present of signal X than

by using the present and past of Y alone (Wiener 1956).

Therefore, if prediction enhancement can be associated

to uncertainty reduction, it is expected that a causality

measure would be naturally expressible in terms of

information theoretic concepts.

First attempts to obtain model-free measures of the

relationship between two random variables were based

on mutual information (MI). MI quantifies the amount

of information that can be obtained about a random

variable by observing another. MI is based on prob-

ability distributions and is sensitive to second and all

higher order correlations. Therefore, it does not rely

on any specific model of the data. However, MI says

little about causal relationships, because of its lack of

directionalanddynamicalinformation:First,MIissym-

metric under the exchange of signals. Thus, it cannot

distinguish driver and response systems. And second,

standard MI captures the amount of information that is

shared by two signals. In contrast, a causal dependence

is related to the information being exchanged rather

than shared (for instance, due to a common drive of

both signals by an external, third source). To obtain

2For a continuous random variable the natural generalization of

Shannon entropy is its differential entropy. Although differential

entropy does not inherit the properties of Shannon entropy as

an information measure, the derived measures of mutual infor-

mation and transfer entropy retain the properties and meaning

they have in the discrete variable case. We refer the reader to

Kaiser and Schreiber (2002) for a more detailed discussion of TE

for continuous variables. In addition, measurements of physical

systems typically come as discrete random variables because of

the binning inherent in the digital processing of the data.

an asymmetric measure, delayed mutual information,

i.e. MI between one of the signals and a lagged version

of another has been proposed. Delayed MI results in

an asymmetric measure and contains certain dynamical

structure due to the time lag incorporated. Neverthe-

less, delayed mutual information has been pointed out

to contain certain flaws such as problems due to a

common history or shared information from a common

input (Schreiber 2000).

A rigorous derivation of a Wiener causal measure

within the information theoretic framework was pub-

lished by Schreiber under the name of transfer entropy

(Schreiber 2000). Assuming that the two time series

of interest X = xtand Y = ytcan be approximated by

Markov processes, Schreiber proposed as a measure of

causality to compute the deviation from the following

generalized Markov condition

p(yt+1|yn

where xm

m and n are the orders (memory) of the Markov

processes X and Y, respectively. Notice that Eq. (1)

is fully satisfied when the transition probabilities or

dynamics of Y is independent of the past of X, this is

in the absence of causality from X to Y. To measure

the departure from this condition (i.e. the presence

of causality), Schreiber uses the expected Kullback-

Leibler divergence between the two probability distri-

butions at each side of Eq. (1) to define the transfer

entropy from X to Y as

t,xm

t) = p(yt+1|yn

t= (xt,...,xt−m+1), yn

t),

(1)

t= (yt,..., yt−n+1), while

TE(X → Y)

=

yt+1,yn

?

t,xm

t

p(yt+1,yn

t,xm

t)log

?p(yt+1|yn

t,xm

t)

t)

p(yt+1|yn

?

,

(2)

Transfer entropy naturally incorporates directional

and dynamical information, because it is inherently

asymmetric and based on transition probabilities. In-

terestingly, Paluš has shown that transfer entropy can

be rewritten as a conditional mutual information (Paluš

2001; Hlavackova-Schindler et al. 2007).

The main convenience of such an information the-

oretic functional designed to detect causality is that,

in principle, it does not assume any particular model

for the interaction between the two systems of interest,

as requested above. Thus, the sensitivity of transfer

entropy to all order correlations becomes an advan-

tage for exploratory analyses over GC or other model

based approaches. This is particularly relevant when

the detection of some unknown non-linear interactions

is required.

Page 4

48 J Comput Neurosci (2011) 30:45–67

Here, we demonstrate that transfer entropy does in-

deed fulfill the above requirements 1–4 and is therefore

a useful addition to the available methods for the quan-

tification of effective connectivity, when used as a met-

ric in a suitable permutation test for independence. We

demonstrate its ability to detect purely non-linear in-

teractions, its ability to deal with a range of interaction

delays, and its robustness against linear cross-talk on

simulated data. This latter point is of particular interest

for non-invasive human electrophysiology using EEG

or MEG. The robustness of TE against linear cross-

talk in the presence of noise, has to our knowledge not

been investigated before. We test transfer entropy on a

variety of simulated signals with different signal gener-

ation dynamics, including biologically plausible signals

with spectra close to 1/f. We also investigate a range of

linear and purely non-linear coupling mechanisms. In

addition, we demonstrate that transfer entropy works

without specifying a signal model, i.e. that requirement

1 is fulfilled. We extend earlier work (Hinrichs et al.

2008; Chávez et al. 2003; Gourvitch and Eggermont

2007) by explicitly demonstrating the applicability of

transfer entropy for the case of linearly mixed signals.

2 Methods

The method section is organized in four main parts. In

the first part we describe how to compute TE numeri-

cally.Asseveralestimationtechniquescouldbeapplied

for this purpose we quickly review these possibilities

and give the rationale for our particular choice of es-

timator. In the second part, we describe two particu-

lar problems that arise in neuroscience applications—

delayed interactions, and observation of the signals of

interest by measurements that only represent linear

mixturesofthesesignals.Thethirdpartprovidesdetails

on the simulation of test cases for the detection of

effective connectivity via TE. The last part contains

details of the MEG recordings in a self-paced finger-

lifting task that we chose as a proof-of-concept for the

analysis of neuroscience data.

2.1 Computation of transfer entropy

Transfer entropy for two observed time series xtand yt

can be written as

TE(X → Y)

=

?

yt+u,ydy

t ,xdx

t

p

?

yt+u,ydy

t,xdx

t

?

log

p

?

yt+u|ydy

pyt+u|ydy

t,xdx

t

?

?

t

?

,

(3)

where t is a discrete valued time-index and u denotes

the prediction time, a discrete valued time-interval. ydy

and xdx

t

are dx- and dy-dimensional delay vectors as

detailed below. An estimator of the transfer entropy

can be obtained via different approaches (Hlavackova-

Schindler et al. 2007). As with other information-

theoretic functionals, any estimate shows biases and

statistical errors which depend on the method used and

the characteristics of the data (Hlavackova-Schindler

et al. 2007; Kraskov et al. 2004). In some applications

the magnitude of such errors is so large that it prevents

any meaningful interpretation of the measure. To our

purposes,itiscrucialthentouseaproperestimatorthat

is as accurate as possible under the specific and severe

constraints that most neuronal data-sets present and to

complement it with an appropriate statistical test. In

particular, a quantifier of transfer entropy apt for neu-

roscience applications should cope with at least three

difficulties. First, the estimator should be robust to

moderate levels of noise. Second, the estimator should

relyonlyonaverylimitednumberofdatasamples.This

point is particularly restrictive since relevant neuronal

dynamics typically unfolds over just a few hundred of

milliseconds. And third, due to the need to reconstruct

the state space from the observed signals, the estimator

should be reliable when dealing with high-dimensional

spaces. Under such restrictive conditions, to obtain a

highly accurate estimator of TE is probably impossible

without strong modelling assumptions. Unfortunately,

strong modelling assumptions require specific informa-

tion which is typically not available for neuroscience

data. Nevertheless, some very general and biophysi-

cally motivated assumptions are available that enable

the use of particular kernel-based estimators (Victor

2002). Here, we build on this framework to derive

a data-efficient estimator, detailed below. Even using

this improved estimator inaccuracies in estimation are

unavoidable, specially for the restrictive conditions

commented above, and it is necessary to evaluate the

statistical significance of the TE measures, i.e. we use

TE as a statistic measuring dependency of two time se-

ries and test against the null hypothesis of independent

time series. Since no parametric distribution of errors

is known for TE, one needs suitable surrogate data

to test the null hypothesis of independent time series

(‘absence of causality’). Suitable in this context means

that the surrogate data should be prepared such that

the causal dependency of interest is destroyed by con-

structing the surrogates but trivial dependencies of no

interest are preserved. It is the particular combination

ofadataefficientestimatorandasuitablestatisticaltest

that forms the core part of this study and its contribu-

tion to the field of effective connectivity analysis.

t

Page 5

J Comput Neurosci (2011) 30:45–6749

In the next subsection we detail both, how to obtain

an data-efficient estimation of Eq. (3) from the raw

signals, and a statistical significance analysis based on

surrogate data.

2.1.1 Reconstructing the state space

Experimental recordings can only access a limited num-

ber of variables which are more or less related to the

full state of the system of interest. However, sensible

causality hypotheses are formulated in terms of the

underlying systems rather than on the signals being

actually measured. To partially overcome this prob-

lem several techniques are available to approximately

reconstruct the full state space of a dynamical sys-

tem from a single series of observations (Kantz and

Schreiber 1997).

In this work, we use a Takens delay embedding

(Takens 1981) to map our scalar time series into tra-

jectories in a state space of possibly high dimension.

The mapping uses delay-coordinates to create a set

of vectors or points in a higher dimensional space ac-

cording to

xd

t=(x(t),x(t−τ),x(t−2τ),...,x(t−(d−1)τ)).

(4)

This procedure depends on two parameters, the di-

mension d and the delay τ of the embedding. While

there is an extensive literature on how to choose

such parameters, the different methods proposed are

far away from reaching any consensus (Kantz and

Schreiber 1997). A popular option is to take the delay

embedding τ as the auto-correlation decay time (act)

of the signal or the first minimum (if any) of the auto-

information. To determine the embedding dimension,

the Cao criterion offers an algorithm based on false

neighbors computation (Cao 1997). However, alter-

natives for non-deterministic time-series are available

(Ragwitz and Kantz 2002).

The parameters d and τ considerably affect the out-

come of the TE estimates. For instance, a low value

of d can be insufficient to unfold the state space of

a system and consequently degrade the meaning of

any TE measure, as will be demonstrated below. On

the other hand, a too large dimensionality makes the

estimators less accurate for a given data length and sig-

nificantly enlarges the computing time. Consequently,

while we have used the recipes described above to

orient our search for good embedding parameters, we

have systematically scanned d and τ to optimize the

performance of TE measures.

2.1.2 Estimating the transfer entropy

After having reconstructed the state spaces of any pair

of time series, we are now in a position to estimate

the transfer entropy between their underlying systems.

We proceed by first rewriting Eq. (3) as sum of four

Shannon entropies according to

?

+ S

TE(X → Y) = S

ydy

?

t,xdx

t

?

− S

?

?

yt+u,ydy

?

t,xdx

?

t

?

yt+u,ydy

t

− S

ydy

t

.

(5)

Thus, the problem amounts to computing the

different joint and marginal probability distributions

implicated in Eq. (5). In principle, there are many ways

to estimate such probabilities and their performance

strongly depends on the characteristics of the data to

be analyzed. See Hlavackova-Schindler et al. (2007) for

a detailed review of techniques. For discrete processes,

the probabilities involved can be easily determined by

the frequencies of visitation of different states. For

continuous processes, the case of main interest in this

study, a reliable estimation of the probability densities

is much more delicate since a continuous density has

to be approximated from a finite number of samples.

Moreover, the solution of coarse-graining a continuous

signal into discrete states is hard to interpret unless

the measure converges when reducing the coarsening

scale. In the following, we reason for our choice of the

estimator and describe its functioning.

A possible strategy for the design of an estimator

relies on finding the parameters that best fit the sam-

ple probability densities into some known distribution.

While computationally straightforward such approach

amounts to assuming a certain model for the proba-

bility distribution which without further constraints is

difficult to justify. From the nonparametric approaches,

fixed and adaptive histogram or partition methods

are very popular and widely used. However, other

nonparametric techniques such as kernel or nearest-

neighbor estimators have been shown to be more data

efficient and accurate while avoiding certain arbitrari-

ness stemming from binning (Victor 2002; Kaiser and

Schreiber 2002). In this work we shall use an estimator

of the nearest-neighbor class.

Nearest-neighbor techniques estimate smooth prob-

ability densities from the distribution of distances of

each sample point to its k-th nearest neighbor. Conse-

quently, this procedure results in an adaptive resolution

since the distance scale used changes according to the

underlying density. Kozachenko-Leonenko (KL) is an

example of such a class of estimators and a standard

algorithm to compute Shannon entropy (Kozachenko

Page 6

50J Comput Neurosci (2011) 30:45–67

and Leonenko 1987). Nevertheless, a naive approach

of estimating TE via computing each term of Eq. (5)

from a KL estimator is inadequate. To see why, it is

important to notice that the probability densities in-

volved in computing TE or MI can be of very different

dimensionality (from 1 + dxup to 1 + dx+ dyfor the

case of TE). For a fixed k, this means that different dis-

tance scales are effectively used for spaces of different

dimension. Consequently, the biases of each Shannon

entropy arising from the non-uniformity of the distrib-

ution will depend on the dimensionality of the space,

and therefore, will not cancel each other.

To overcome such problems in mutual information

estimates, Kraskov, Stögbauer, and Grassberger have

proposed a new approach (Kraskov et al. 2004). The

key idea is to use a fixed mass (k) only in the higher

dimensional space and project the distance scale set

by this mass into the lower dimensional spaces. Thus,

the procedure designed for mutual information sug-

gests to first determine the distances to k-th nearest

neighbors in the joint space. Then, an estimator of MI

can be obtained by counting the number of neighbors

that fall within such distances for each point in the

marginal space. The estimator of MI based on this

method displays many good statistical properties, it

greatly reduces the bias obtained with individual KL

estimates, and it seems to become an exact estimator

in the case of independent variables. For these reasons,

in this work we have followed a similar scheme to

provide an data-efficient sample estimate for transfer

entropy (Gomez-Herrero et al. 2010). Thus, we have

obtained an estimator that permits us, at least partially,

to tackle some of the main difficulties faced in neuronal

data sets mentioned in the beginning of the Methods

section. In summary, since the estimator is more data

efficient and accurate than other techniques (especially

those based on binning), it allows to analyze shorter

data sets possibly contaminated by small levels of noise.

At the same time, the method is especially geared to

handle the biases of high dimensional spaces naturally

occurring after the embedding of raw signals.

As to computing time, this class of methods spends

most of resources in finding neighbors. It is then highly

advisable to implement an efficient search algorithm

which is optimal for the length and dimensionality of

the data to be analyzed (Cormen et al. 2001). For the

current investigation, the algorithm was implemented

with the help of OpenTSTool (Version1.2 on Linux

64 bit; Merkwirth et al. 2009). The full set of methods

applied here is available as an open source MATLAB

toolbox (Lindner et al. 2009).

In practice, it is important to consider that this kernel

estimation method carries two parameters. One is the

mass of the nearest-neighbors search (k) which controls

the level of bias and statistical error of the estimate.

For the remainder of this manuscript this parameter

was set to k = 4, as suggested in Kraskov et al. (2004),

unless stated otherwise. The second parameter refers

to the Theiler correction which aims to exclude au-

tocorrelation effects from the density estimation. It

consists of discarding for the nearest-neighbor search

those samples which are closer in time to a reference

point than a given lapse (T). Here, we chose T = 1 act,

unless stated otherwise. In general, it means that even

though TE does not assume any particular model, its

numerical estimation relies on at least five different pa-

rameters; the embedding delay (τ) and dimension (d),

the mass of the nearest neighbor search (k), the Theiler

correction window (T), and the prediction time (u).

The latter accounts for non-instantaneous interactions.

Specifically it reflects that in that case an increment of

predictability of one signal thanks to the incorporation

of the past of others should only occur for a certain

latency or prediction time. Since axonal conduction

delays among remote areas can amount to tens of

milliseconds (Swadlow and Waxman 1975; Swadlow

1994), its incorporation for a sensible causality analysis

of neuronal data sets is important for the results as we

shall see below.

2.1.3 Significance analysis

To test the statistical significance of a value for TE

obtained we used surrogate data. In general, generating

surrogate data with the same statistical properties as

the original data but selectively destroying any causal

interaction is difficult. However, when the data set has

a trial structure it is possible to reason that shuffling

trials generates suitable surrogate data sets for the

absence of causality hypothesis if stationarity and trial

independency are assured. On these data we have then

used a permutation test (∼19,000 permutations) on

the unshuffled and shuffled trials to obtain a p-value.

P-values below 0.05 were considered significant. Where

necessary a correction of this threshold for multiple

comparisons was applied using the false discovery rate

(FDR, q < 0.05; Genovese et al. 2002).

2.2 Particular problems in neuroscience data:

instantaneous mixing and delayed interactions

Neuroscience data have specific characteristics that

challenge a simple analysis of effective connectivity.

First, the interaction may involve large time delays of

unknown duration and, second, the data generated by

the original processes may not be available but only

Page 7

J Comput Neurosci (2011) 30:45–6751

measurements that represent linear mixtures of the

original data—as is the case in EEG and MEG. In this

section we describe a number of additional tests that

may help to interpret the results obtained by computing

TE values from these types of neuroscience data.

Tests for instantaneous linear mixing and for multiple

noisy observations of a single source Instantaneous,

linear mixing of the original signals by the measure-

ment process as is always present in MEG and EEG

data. This may result in two problems: First, linear

mixing may reduce signal asymmetry and, thus, make

it more difficult to detect effective connectivity of the

underlying sources. This problem is mainly one of

reduced sensitivity of the method and maybe dealt

with, e.g. by increasing the amount of data. A second

problem arises when a single source signal with an

internal memory structure is observed multiple times

on different channels with individual channel noise. As

demonstrated before (Nolte et al. 2008) this latter case

can result in false positive detection of effective con-

nectivity for methods based on Wiener’s definition of

causality (Wiener 1956). This problem is more severe,

because it reduces the specificity of the method. As

an example of this problem think of an AR process of

order m, s(t)

s(t) =

m

?

i=1

αis(t − i) + ηs(t)

(6)

that is mixed with a mixing parameter ? onto two sensor

signals X?,Y?in the following way

X?(t) = s(t),

Y?(t) = (1 − ?)s(t) + ?ηY,

where the dynamics for Y?can be rewritten as

(7)

(8)

Y?(t) = (1 − ?)

m

?

i=1

αiX?(t − i) + (1 − ?)ηs+ ?ηY.

(9)

In this case TE will identify a causal relationship be-

tween X?and Y?as it detects the relationship between

the past of X?and the present X?that is contained in Y?

as (1 − ?)ηs. Therefore, we implemented the following

additional test (‘time-shift test’) to avoid false positive

reports for the case of instantaneous, linear mixing:

We shifted the time series for X?by one sample into

the past X??(t) ←? X?(t + 1) such that a potential in-

stantaneous mixing becomes lagged and thereby causal

in Wiener’s sense. For instantaneous mixing processes

TE values increase for the interaction from the shifted

time series X??(t) to Y?compared to the interaction

from the original time series X?(t) to Y?. Therefore,

an increase of this kind may indicate the presence of

instantaneous mixing. The actual shift test implements

the null hypothesis of instantaneous mixing and the

alternativehypothesisofnoinstantaneousmixinginthe

following way:

H0: TE(X??(t) → Y?) ≥ TE(X?(t) → Y?)

H1: TE(X??(t) → Y?) < TE(X?(t) → Y?)

If the null hypothesis of instananeous mixing is not

discarded by this test, i.e. if TE values for the original

data are not significantly larger than those for the

shifted data, then we have to discard the hypothesis of

a causal interaction from X?to Y?. Therefore, when

data potentially contained instantaneous mixing, we

tested for the presence of instantaneous mixing before

proceeding to test the hypothesis of effective connec-

tivity. More specifically, this test was applied for the

instantaneously mixed simulation data (Figs. 4, 5, 6)

andtheMEGdata(Fig.8).Ingeneral,wesuggesttouse

this test, whenever the data in question may have been

obtained via a measurement function that contained

linear, instantaneuos mixing.

A less conservative approach to the same problem

would be to discard data for TE analysis only when we

have significant evidence for the presence of instanta-

neous mixing. In this case the hypotheses would be:

(10)

H0: TE(X??(t) → Y?) ≤ TE(X?(t) → Y?)

H1: TE(X??(t) → Y?) > TE(X?(t) → Y?)

In this case we would proceed analysing the data if

we did not have to reject H0. For the remainder of

this manuscript, however, we stick to testing the more

conservative null hypothesis presented in Eq. (10).

(11)

Delayed interactions, Wiener’s definition of causality,

and choice of embedding parameters This paragraph

introduces a difficulty related to Wiener’s definition

of causality. As described above, non-zero TE values

can be directly translated into improved predictions in

Wiener’s sense by interpreting the terms in Eq. (2) as

transition probabilities, i.e. as information that is useful

for prediction. TE quantifies the gain in our knowledge

about the transition probabilities in one system Y, that

we obtain if we condition these probabilities on the

past values of another system X. It is obvious that this

gain, i.e. the value of TE, can be erroneously high,

if the transition probabilities for system Y alone are

not evaluated correctly. We now describe a case where

this error is particularly likely to occur: Consider two

processes with lagged interactions and long autocorre-

lation times. We assume that system X drives Y with an

Page 8

52 J Comput Neurosci (2011) 30:45–67

interaction delay δ (Fig. 1). A problem arises if we test

for a causal interaction from Y to X, i.e. the reverse

direction compared to the actual coupling, and do not

take enough care to fully capture the dynamics of X via

embedding. If for example the embedding dimension d

or the embedding delay τ was chosen too small, then

some information contained in the past of X is not

used although it would improve (auto-) prediction. This

information is actually transferred to Y via the delayed

interaction from X to Y. It is available in Y with a delay

δ, and therefore, at time-points were data from Y is

usedforthepredictionof X.Asstatedbeforethisinfor-

mation is useful for the prediction of X. Thus, inclusion

of Y will improve prediction. Hence, TE values will be

non-zero and we will wrongly conclude that process Y

drives process X.

2.3 Simulated data

We used simulated data to test the ability of TE to

uncover causal relations under different situations rel-

evant to neuroscience applications. In particular, we al-

ways considered two interacting systems and simulated

different internal dynamics (autoregressive and 1/f

characteristics), effective connectivity (linear, thresh-

oldandquadraticcoupling),andinteractiondelays(sin-

gle delay and a distribution of delays). In addition, we

simulated linear instantaneous mixing processes during

Y

(target)

X

(source)

r(X(u),X(u-t))

0

delay δ

u

τ τ

dused = 3

Y X ?

Fig. 1 Illustration of false positive effective connectivity due to

insufficient embedding for delayed interactions. Source signal

X drives target signal Y with a delay δ. The internal memory

of process X is reflected in the slowly decaying autocorrelation

function (top). For the evaluation of TE from Y to X, X is

embedded for auto-prediction with d = 3 and τ, as indicated by

the dark gray box. The data point of X that is to be predicted

with prediction time u is indicated by the star shaped symbol.

Datapointsusedforauto-predictionareindicatedbyfilledcircles

on signal X. Data points used for cross-prediction from Y to X

are indicated by filled circles on signal Y. Due to the delayed

interaction from X to Y information about X earlier than the

embedding time gets transferred from X to Y where it gets

included in the embedding (open circle). Y contains information

the history of X that is useful for predicting X (see open circle,

autocorrelation curve) but not contained in the embedding used

on X. Hence, inclusion of Y will improve the prediction of X and

false positive effective connectivity is found. Introducing a larger

embeddingdimensionororlargerembeddingdelay,incorporates

this information into the embedding of X. Examples of this effect

can be found in Tables 1 and 2

Page 9

J Comput Neurosci (2011) 30:45–67 53

measurement, because of their relevance for EEG and

MEG.

2.3.1 Internal signal dynamics

We have simulated two types of complex internal signal

dynamics. In the first case, an autoregressive process

of order 10, AR(10), is generated for each system. The

dynamics is then given by

x(t + 1) =

9

?

i=0

αix(t − i) + ση(t),

(12)

where the coefficients αiare drawn from a normalized

Gaussian distribution, the innovation term η represents

a Gaussian white noise source, and σ controls the

relative strength of the noise contribution. Notice, that

we use here the typical notation in dynamical systems

where the innovation term η(t) is delayed one unit with

respect the output x(t + 1).

As a second case, we have considered signals with a

1/fθprofileintheirpowerspectra.Toproducesuchsig-

nals we have followed the approach in Granger (1980).

Accordingly, the 1/fθtime series are generated as the

aggregation of numerous AR(1) processes with an ap-

propriate distribution of coefficients. Mathematically,

each 1/fθsignal is then given by

x(t + 1) =

1

N

N

?

i=1

ri(t),

(13)

where we aggregate over N = 500 AR(1) processes

each described as

ri(t) = αiri(t − 1) + ση(t),

with the coefficients αirandomly chosen according to

the probability density function ∼ (1 − α)1−θ.

(14)

2.3.2 Types of interaction

To simulate a causal interaction between two systems

we added to the internal dynamics of one process (Y)

a term related to the past dynamics of the other (X).

Three types of interaction or effective connectivity

were considered; linear, quadratic, and threshold. In

the linear case, the interaction is proportional to the

amount of signal at X. The last two cases represent

strong non-linearities which challenge approaches of

detection based on linear or parametric methods. The

effective connectivity mediated by the threshold func-

tion is of special relevance in neuroscience applications

due to the approximated all or none character of the

neuronal spike generation and transmission. Mathe-

matically, the update of y(t) is then modeled by the ad-

dition of an interaction term such that the full dynamics

is described as

⎧

⎪⎩

where D(.) represents the internal dynamics (AR(10)

or 1/f) of y and y− represents past values of y. In

the last case, the threshold function is implemented

through a sigmoidal with parameters b1and b2which

control the threshold level and its slope, respectively.

Here, b1was set to 0 and b2was set to 50. In all cases,

δ represents a delay which typically arises from the

finite speed of propagation of any influence between

physically separated systems. Note that since we deal

with discrete time models (maps) in our modeling δ

takes only positive integer values.

In case that two systems interact via multiple path-

ways it is possible that different latencies arise in

their communication. For example, it is known that

the different characteristics of the axons joining two

brain areas typically lead to a distribution of axonal

conductiondelays(Swadlowetal.1978;Swadlow1985).

To account for that scenario we have also simulated the

case where δ instead of a single value is a distribution.

Accordingly, for each type of interaction we have con-

sidered the case where the interaction term is

y(t) = D(y−) +

⎪⎨

γlinx(t − δ)

γquadx2(t − δ)

γthresh

1+exp(b1+b2x(t−δ))

if linear,

if quadratic,

if threshold,

1

Interaction term

⎧

⎪⎩

where the sums are extended over a certain domain

of positive integer values. In the results section we

consider the case in which δ?takes values on a uniform

distribution of width 6 centered around a given delay.

The coupling constants γlin, γquad, γthresh were al-

ways chosen such that the variance of the interaction

term was comparable to the variance of y(t) that would

be obtained in the absence of any coupling.

=

⎪⎨

?

?

δ? γlinx(t − δ?)

δ? γquadx2(t − δ?)

δ? γthresh

1+exp(b1+b2x(t−δ?))

if linear,

if quadratic,

if threshold ,

?

1

2.3.3 Linear mixing

Linear instantaneous mixing is present in human non-

invasive electrophysiological measurements such as

EEG or MEG and has been shown to be problem-

atic for GC (Nolte et al. 2008). The problem we en-

counter for linearly and instantaneously mixed signals

is twofold: On the one hand, instantaneous mixing from

coupled source signals onto sensor signals by the mea-

Page 10

54 J Comput Neurosci (2011) 30:45–67

surement process degrades signal asymmetry (Tognoli

and Scott Kelso 2009), it will therefore be harder to

detect effective connectivity. On the other hand—as

showninNolteetal.(2008)—instantaneouspresenceof

a single source signal in two measurements of different

signal to noise ratio may be interpreted as effective

connectivity erroneously. To test the influence of linear

instantaneous mixing we created two test cases:

(A)The first test case consisted in unidirectionally

coupled signal pairs X → Y generated from cou-

pled AR(10) processes as described above and

then transformed into two linear instantaneous

mixtures X?,Y?in the following way:

X?(t) = (1 − ?)X(t) + ?Y(t)

Y?(t) = ?X(t) + (1 − ?)Y(t)

Here, ? is a parameter that describes the amount

of linear mixing or ‘signal cross-talk’. A value

of ? of 0.5 means that the mixing leads to two

identical signals and, hence, no significant TE

should be observed. We then investigated for

three different values of ? = (0.1,0.25,0.4) how

well TE detects the underlying effective connec-

tivity from X to Y if only the linear mixtures

X?,Y?are available.

The second test case consisted in generating mea-

surement signals X?,Y?in the following way:

(15)

(16)

(B)

X?(t) = s(t)

Y?(t) = (1 − ?)s(t) + ?ηY

Here, s(t) is the common source, a mean-free

AR(10) process with unit variance. s(t) is mea-

sured twice: once noise free in X? and once

dampened by a factor (1 − ?) and corrupted by

independent Gaussian noise of unit variance, ηY,

in Y?. Here, we tested the ability of our im-

plementation of TE to reject the hypothesis of

effective connectivity. This second test case is of

particular importance for the application of TE

to EEG and MEG measurements where often

a single source may be observed on two sensors

that have different noise characteristics, i.e. due

to differences in contact resistance of the EEG

electrodes or the characteristics of the MEG-

SQUIDS.

(17)

(18)

2.3.4 Choice of embedding parameters for delayed

interactions

To demonstrate the effects of suboptimal embedding

parameters for the case of delayed interactions we

simulated processes with autoregressive order 10

(AR(10)) dynamics, three different interaction delays

(5, 20, 100 samples) and all three coupling types (linear,

threshold, quadratic). The two processes were coupled

unidirectionally X → Y. 15, 30, 60, and 120 trials were

simulated. We tested for effective connectivity in both

possible directions using permutation testing. All cou-

pled processes were investigated with three different

prediction times u of 6, 21, and 101 samples. The

remaining analysis parameters were: d = 7, τ = 1 act,

k = 4, T = 1 act. In addition, we simulated processes

with 1/f dynamics, an interaction delay δ of 100 sam-

ples and a unidirectional, quadratic coupling. 30 trials

were simulated and we tested for effective connectivity

in both directions. These coupled processes were inves-

tigated with all possible combinations of three different

embedding dimensions d = 4, 7, 10, two different

embedding delays τ = 1 act or τ = 1.5 act and three

different prediction times u = 6, 21, 101 samples. The

remaining analysis parameters were: k = 4, T = 1 act.

Results are presented in Tables 1 and 2.

2.4 MEG experiment

Rationale In order to demonstrate the applicability of

TE to neuroscience data obtained non-invasively we

performed MEG recordings in a motor task. Our aim

was to show that TE indeed gave the results that were

expected based on prior, neuroanatomical knowledge.

To verify the correctness of results in experimental

data is difficult because no knowledge about the ulti-

mate ground truth exists when data are not simulated.

Therefore, we chose an extremely simple experiment—

self-paced finger lifting of the index fingers in a self-

chosen sequence—where very clear hypotheses about

the expected connectivity from the motor cortices to

the finger muscles exist.

Subjects and experimental task Two subjects (S1, m,

RH, 38 yrs; S2, f, RH, 23 yrs) participated in the

experiment. Subjects gave written informed consent

prior to the recording. Subjects had to lift the right and

left index finger in a self-chosen randomly alternating

sequence with approximately 2s pause between succes-

sive finger liftings. Finger movements were detected

using a photosensor. In addition, an electromyogram

(EMG) response was recorded from the extensor mus-

cles of the the right and left index fingers.

Recording

recorded using a 275 channel whole head system

(OMEGA2005, VSM MedTech Ltd., Coquitlam, BC,

Canada) in a synthetic 3rd order gradiometer configu-

ration. Additional electrocardiographic, -occulographicc

and preprocessing MEG data were

Page 11

J Comput Neurosci (2011) 30:45–67 55

Table 1 Detection of true and false effective connectivity for a

fixed embedding dimension d of 7, and an embedding delay τ of

1 autocorrelation time

Dynamics

δ

Coupling

uX → Y

True

1

1

0

1

1

0

1

1

0

1

1

1

0

1

0

0

1

0

1

1

1

0

0

1

1

1

1

Y → X

False

1

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

AR(10)

5

5

5

5

5

5

5

5

5

Lin

Lin

Lin

Threshold

Threshold

Threshold

Quadratic

Quadratic

Quadratic

Lin

Lin

Lin

Threshold

Threshold

Threshold

Quadratic

Quadratic

Quadratic

Lin

Lin

Lin

Threshold

Threshold

Threshold

Quadratic

Quadratic

Quadratic

6

21

101

6

21

101

6

21

101

20

20

20

20

20

20

20

20

20

100

100

100

100

100

100

100

100

100

6

21

101

6

21

101

6

21

101

6

21

101

6

21

101

6

21

101

Given is the detected effective connectivity in dependence of the

parameter prediction time u for data with different interaction

delays δ of 5, 20, and 100 samples. Data were simulated with

autoregressive order ten dynamics and unidirectional coupling

X → Y via three different coupling functions (linear, threshold,

quadratic). Simulation results based on 120 trials. Note: false

positives emerge for short interaction delays δ, i.e. the inclusion

of more recent samples of X, i.e. samples that are just before the

earliest embedding time-point; false positives in these cases are

suppressed using a larger prediction time, i.e. moving the embed-

ding of X and the samples of X that are transferred to Y further

into the past; short interaction delays can robustly be detected

with prediction times that are longer than the interaction delay,

if the difference is not excessive

and -myographic recordings were made to measure

the electrocardiogram(ECG),

vertical electrooculography (EOG) traces, and the

electromyogram (EMG) for the extensor muscles of

the right and left index fingers. Data were hardware

filtered between 0.5 and 300 Hz and digitized at a

sampling rate of 1.2 kHz. Data were recorded in

two continuous sessions lasting 600 s each. For the

analysis of effective connectivity between scalp sensors

and the EMG, data were preprocessed using the

Fieldtrip open-source toolbox for MATLAB (http://

horizontaland

Table 2 Detection of true and false effective connectivity in

dependence of the parameters embedding delay τ, embedding

dimension d, and prediction time u for data with unidirectional

coupling X → Y via a quadratic function, 1/f dynamics and an

interaction delay δ of 100 samples

Dynamics

1/f

1/f

1/f

1/f

1/f

1/f

1/f

1/f

1/f

1/f

1/f

1/f

δ

100

100

100

100

100

100

100

100

100

100

100

100

du

τ [ACT]

1

1

1

1

1

1

1.5

1.5

1.5

1.5

1.5

1.5

X → Y

0

1

0

1

0

1

0

1

0

1

0

1

Y → X

0

1

1

0

0

0

0

0

0

0

0

0

4

4

7

7

21

101

21

101

21

101

21

101

21

101

21

101

10

10

4

4

7

7

10

10

Simulation results based on 30 trials. Note how a larger τ elim-

inates false positive TE results for effective connectivity. Also

note how the delocalization in time provided by the embedding

enables the detection of effective connectivities also for interac-

tion delays larger than the prediction time

fieldtrip.fcdonders.nl/; version 2008-12-10). Data were

digitally filtered between 5 and 200 Hz and then cut

in trials from −1,000 ms before to 90 ms after the

photosensor indicated a lift of the left or right index

finger. This latency range ensured that enough EMG

activity was included in the analysis. We used the

artifact rejection routines implemented in Fieldtrip to

discard trials contaminated with eye-blinks, muscular

activity and sensor jumps.

Analysis of effective connectivity at the MEG sensor

level using transfer entropy Effective connectivity was

analyzed using the algorithm to compute transfer en-

tropy as described above. The algorithm was imple-

mented as a toolbox (Lindner et al. 2009) for Fieldtrip

data structures (http://fieldtrip.fcdonders.nl/) in MAT-

LAB. The nearest neighbour search routines were im-

plemented using OpenTSTool (Version1.2 on Linux 64

bit; Merkwirth et al. 2009). Parameters for the analysis

were chosen based on a scanning of the parameter

space, to obtain maximum sensitivity. In more detail

we computed the difference between the transfer en-

tropy for the MEG data and the surrogate data for all

combinations of parameters chosen from: τ = 1 act, u ∈

[10,16,22,30,150], d ∈ [4,5,7], k ∈ [4,5,6,7,8,9,10].

We performed the statistical test for a significant de-

viation from independence for each of these parame-

tersets. This way a multiple testing problem arose, in

addition to the multiple testing based on the multiple

directed intercations between the chosen sensors (see

next paragraph). We therefore performed a correc-

Page 12

56J Comput Neurosci (2011) 30:45–67

tion for multiple comparisons using the false discov-

ery rate (FDR, q < 0.05, Genovese et al. 2002). The

parameter values with optimum sensitivity, i.e. most

sginificant results across sensor pairs after corrcetion

for multiple comparison were: embedding dimensions

d = 7, embedding delay τ = 1 act, forward prediction

time u = 16 ms, number of neighbors considered for

density estimations k = 4, time window for exclusion

of temporally correlated neighbors T = 1act. In addi-

tion we required that prediction should be possible for

at least 150 samples, i.e. individual trials where the

combination of a long autocorrelation time and the

embedding dimension of 7 did not leave enough data

for prediction were discarded. We required that at least

30 trials should survive this exclusion step for a dataset

to be analyzed.

Even a simple task like self-paced lifting of the left or

right index finger potentially involves a very complex

network of brain areas related to volition, self-paced

timing, and motor execution. Not all of the involved

causal interactions are clearly understood to date. We

therefore focused on a set on interactions where clear-

cut hypothesis about the direction of causal interactions

and the differences between the two conditions existed:

We examined TE from the three bilateral sensor pairs

displaying the largest amplitudes in the magnetically

evoked fields (MEFs) (compare Fig. 7) before onset

of the two movements (left or right finger lift) to both

EMGchannels.Thisalsohelpedtoreducecomputation

time, as for an all-to-all analysis of effective connectiv-

ity at the MEG and EMG sensor level would involve

the analysis of 277 × 276 directed connections. We then

tested connectivities in both conditions against each

otherbycomparingthedistributionsofTEvaluesinthe

two conditions using a permutation test. For this latter

comparisonaclearlateralizationeffectwasexpected,as

task related causal interactions common to both condi-

tions should cancel. Activity in at least three different

frequency bands has been found in the motor cortex

and it has been proposed that each of these different

frequency bands subserves a different function:

– A slow rhythm (6–10 Hz) has been postulated to

provide a common timing for agonist/antagonist

muscles pairs in slow movements and is thought

to arise from from synchronization in a cerebello-

thalamo-cortical loop (Gross et al. 2002). The cou-

pling of cortical (primary motor cortex M1, pri-

mary somatosensory cortex S1) activity to muscular

activity was proposed to be bidirectional (Gross

et al. 2002) in this frequency range. The coupling

may also depend on oscillations in spinal stretch

reflex loops (Erimaki and Christakos 2008).

–Activity in the beta range (∼20 Hz) has been sug-

gested to subserve the maintenance of current limb

position (Pogosyan et al. 2009) and strong cortico-

muscular coherence in this band has been found

in isometric contraction accordingly (Schoffelen et

al. 2008). Coherent activity in the beta band has

also been demonstrated between bilateral motor

cortices (Mima et al. 2000; Murthy and Fetz 1996).

In contrast, motor-act related activity in the gamma

band(>30Hz)isreportedlessfrequentlyanditsre-

lation to motor control is less clearly understood to

date (Donoghue et al. 1998). We therefore focused

our analysis on a frequency interval from 5–29 Hz.

–

Note that we omitted the frequently proposed pre-

processing of the EMG traces by rectification (Myers

et al. 2003), as TE should be able to detect effective

connectivity without this additional step.

3 Results

3.1 Overview

In this section we first present the analysis of effective

connectivity in pairs of simulated signals {X,Y}. All

signal pairs were unidirectionally coupled from X to Y.

We used three coupling functions: linear, threshold and

a purely non-linear quadratic coupling. We simulated

two different signal dynamics, AR(10) processes and

processes with 1/f spectra, that were close to spectra

observed in biological signals. The two signals of a

pair always had similar characteristics. We always ana-

lyzed both directions of potential effective connectivity:

X → Y and Y → X to quantify both, sensitivity and

specificity of our method.

In addition to this basic simulation we investigated

the following special cases: coupling via multiple cou-

pling delays for linear and threshold interactions, lin-

early mixed observation of two coupled signals for lin-

ear and threshold coupling, and observation of a single

signal via two sensors with different noise levels. In this

last case no effective connectivity should be detected.

The absence of false positives in this latter case is of

particular importance for EEG and MEG sensor-level

analysis.

As a proof of principle we then applied the analy-

sis of effective connectivity via TE to MEG signals

recorded in a self-paced finger lifting task. Here the aim

was to recover the known connectivity from contralat-

eral motor cortices to the muscles of the moved limb,

via a comparison of effective connectivty for left and

right finger lifting.

Page 13

J Comput Neurosci (2011) 30:45–6757

3.2 Simulation study

Detection of non-linear interactions for various signal

dynamics Transfer entropy in combination with per-

mutation testing correctly detected effective connectiv-

ity (X → Y) for both, autoregressive order 10 and 1/f

signal dynamics and all three simulated coupling types

(linear, threshold, quadratic) if at least 30 trials were

used to compute statistics (Fig. 2). No false positives,

i.e. significant results for the direction Y → X, were

observed. We note that the cross-correlation function

between the signals X and Y were flat when coupled

non-linearly, which indicates that linear approaches

may be insufficient to detect a significant interaction in

those cases.

Detection of interactions with multiple interaction de-

lays The statistical evaluation of TE values robustly

detected the correct direction of effective connectivity

(X→Y) for the two unidirectionally coupled AR(10)

Fig. 2 Detection of effective

connectivity by TE for

two unidirectionally

coupled signals (X → Y).

(a–c) Signals generated

from an autoregressive order

ten process and coupled via

(a) linear, (b) threshold,

and (c) quadratic coupling.

(d–f) Signals generated

with dynamics of a 1/f

noise process and coupled

via (d) linear, (e) threshold,

and (f) quadratic coupling.

A single interaction delay

of 20 samples was used.

Time courses of source

(X) and target (Y) signals

on the left and results of

permutation testing for a

varying number of trials

(15, 30, 60, 120) on the right.

Black bars indicate (1-p)

values for coupling X → Y

(true coupling direction),

gray bars indicate values of

(1-p) for coupling Y → X.

The dashed line corresponds

to significant effective

connectivity (p < 0.05)

X

(source)

process: autoregressive order 10; delay: single delay - 20 samples; couling: X

(TE-parameters: d=7, τ =act(Y), u=21)

Y

linear

threshold

quadratic

4

-4

0

0 100 200 300

samples

process: 1/f noise; delay: single delay - 20 samples; coupling: X

(TE-parameters: d=7, τ =1.5*act(Y), u=21)

Y

(target)

X

(source)

Nr of trials

Y

Y

X

X

Y X

X Y

1

0

15 30 60 120

1

0

1 - p

15 30 60 120

delay = 20

1

0

15 30 60 120

α=.05

(a)

(b)

(c)

z(a.u.)

Y

linear

threshold

quadratic

4

-4

0

0 100 200 300

samples

Y

(target)

Nr of trials

Y

Y

X

X

Y X

X Y

1

0

15 30 60 120

1

0

1 - p

15 30 60 120

delay = 20

1

0

15 30 60 120

α=.05

(d)

(e)

(f)

z(a.u.)

Page 14

58 J Comput Neurosci (2011) 30:45–67

time series (X,Y), coupled via a range of delays δ from

17–23 samples, and for the two unidirectionally coupled

1/f time series, coupled via a range of delays δ from 97-

103 samples. The correct coupling direction (X → Y)

was found for all three investigated coupling functions

(linear, threshold, quadratic), even if only 15 trials were

investigated (Fig. 3). For these analysis we used a pre-

diction time u of 21 samples for the case of a delay δ of

17–23 samples, and a prediction time u of 101 samples

for the delay δ of 97–103 samples. Correct detection

of effective connectivity was also possible when using

a prediction time u of 21 samples for the delay δ of

97–103 samples, i.e. a prediction time that was shorter

than the interaction delay (data not shown). This was

expected because of the delocalization in time provided

for by the delay embedding. However, no effective

connectivity was detected when using a prediction time

u of 101 samples for a interaction delay δ of 17–23

samples, i.e. when using a prediction time that was

considerably longer than the interaction delay (data not

Fig. 3 Detection of effective

connectivity by TE for two

unidirectionally coupled

time series (X → Y) with

a range of coupling delays

as indicated by the shaded

boxes in (a) and (d).

(a–c) autoregressive order

ten processes; interaction

delays 17–23 samples.

(a) Linear interaction,

(b) threshold coupling,

and (c) quadratic coupling.

(d–f) 1/f processes;

interaction delays 97–103

samples. (d) Linear

interaction, (e) threshold

coupling, and (f) quadratic

coupling. Time series are

plotted on the left, results

of permutation testing for

different numbers of

simulated trials (15, 30,

60, 120) on the right. Black

bars indicate values of (1-p)

for coupling X → Y (true

coupling direction), gray

bars indicate values of (1-p)

for coupling Y → X. The

dashed line corresponds

to significant effective

connectivity (p < 0.05)

process: autoregressive order 10; multiple delays 17-23 samples; coupling X Y

(TE-parameters: d=7, τ=act(Y), u=21)

linear

threshold

quadratic

4

-4

0

0 100 200 300

samples

Y

(target)

X

(source)

Y

Y

X

X

Y X

X Y

1

0

15 30 60 120

Nr of trials

1

0

1 - p

15 30 60 120

delay = 17-23

1

0

15 30 60 120

α=.05

(a)

(b)

(c)

z(a.u.)

process: 1/f noise; multiple delays 97-103 samples; coupling X Y

(TE-parameters: d=7, τ =act(Y), u=101)

linear

threshold

quadratic

4

-4

0

0 100 200 300

samples

Y

(target)

X

(source)

Nr of trials

Y

Y

X

X

Y X

X Y

1

0

15 30 60 120

1

0

1 - p

15 30 60 120

delay = 97-103

1

0

15 30 60 120

α=.05

(d)

(e)

(f)

z(a.u.)

Page 15

J Comput Neurosci (2011) 30:45–6759

shown; compare Table 1 for single interaction delays).

No false positive effective connectivities (Y→X) were

found. However, relatively high values for (1-p) for

some cases indicate that the embedding parameters

were not optimally chosen, as discussed below.

Detection of effective connectivity from linearly mixed

measurement signals In order to investigate the appli-

cation of TE to EEG and MEG sensor signals, where

the signals from the processes in question can only be

observed after linear mixing processes, we simulated

two unidirectionally coupled AR(10) signals (X → Y

with linear or threshold coupling). These signals then

underwent a symmetric linear mixing process in depen-

dence of a parameter ? in the range from 0.1 to 0.4,

where a value of ? = 0.5 would indicate identical mixed

signals (see Eqs. (15), (16)). For the case of linearly

coupled source signals TE indicated effective connec-

tivity in direction from the sensor signal X?that had a

higher contribution from the driving process (X) to the

sensor Y?dominated by the receiving process (Y) for

all investigated cases of linearly mixed measurement

signals except for the case of ? = 0.4. In this case TE

detected the correct direction of the interaction and

did not result in false positive detection, however, the

time-shift test indicated the presence of instantaneous

mixing and the result could not be counted as a cor-

rect detection of effective connectivity. For the case of

source signals that were coupled via a threshold func-

tionTEincombinationwiththetime-shifttestcorrectly

identified effective connectivty and did not result in

false positive detection for all of the investigated linear

mixing strengths. These observations held even if only

15 trials were evaluated (Figs. 4 and 5).

Fig. 4 Simulation results

for linearly mixed measure-

ments (X?, Y?) of two

unidirectionally and linearly

coupled underlying source

signals (X → Y). (a) Mixing

model and original auto-

regressive source time

courses X, Y. (b–d) Effective

connectivity between

sensor-level signals X?, Y?.

Left statistics of permutation

tests of TE values for the

original sensor level data

against trial-shuffled

surrogate data after

application of the additional

time-shift test. The plots

contain values of (1-p) in

dependence of the number

of investigated number of

trials. Black bars indicate

values for the effective

connectivity from the sensor

dominated by the driving

source signal (X?) to the

sensor dominated by the

receiving source signal (Y?).

Light grey bars indicate the

reverse direction of effective

connectivity. The dashed line

corresponds to siginificant

effective connectivity

(p < 0.05). Right time-

courses of signals X?and

Y?for a single trial

process: autoregressive order 10; coupling: linear; single delay; observation: linearly mixed

(TE-parameters: d=7, τ=1.5*act(Y), u=21)

Y

(target)

X

(source)

delay = 20

raw data

X(t)

Xε(t) = (1-ε)X(t) + εY(t)

Yε(t) = (1-ε)Y(t) + εX(t)

Y(t)

(1-ε)

(1-ε)

ε

ε

(a)

Yε Xε

Xε Yε

ε = 0.1

1

0

15 30 60 120

α=.05

(b)

Yε

Xε

ε = 0.25

Yε

Xε

1

0

15 30 60 120

(c)

ε = 0.4

4

-4

0

0 100 200 300

samples

Nr of trials

Yε

Xε

1

0

1 - p

15 30 60 120

(d)

z(a.u.)