Page 1
Large-Scale EEG/MEGSource Localization with Spatial Flexibility
Stefan Haufe a,b,∗, Ryota Tomioka c, Thorsten Dickhaus a, Claudia Sannelli a, Benjamin Blankertz a,d,b,
Guido Nolte d and Klaus-Robert Mu¨ller a,b
aMachine Learning Group, Department of Computer Science, Berlin Institute of Technology, Franklinstr. 28/29, D-10587 Berlin, Germany
bBernstein Focus Neurotechnology, Berlin, Germany
cInformation-Theoretic Machine Learning and Data Mining Group, Department of Mathematical Informatics, Graduate School of
Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.
dIntelligent Data Analysis Group, Fraunhofer Institute FIRST, Kekule´str. 7, D-12489 Berlin, Germany
Abstract
We propose a novel approach to solving the electro- / magnetoencephalographic (EEG / MEG) inverse problem which is based
upon a decomposition of the current density into a small number of spatial basis fields. It is designed to recover multiple sources of
possibly different extent and depth, while being invariant with respect to phase angles and rotations of the coordinate system. We
demonstrate the method’s ability to reconstruct simulated sources of random shape and show that the accuracy of the recovered
sources can be increased, when interrelated field patterns are co-localized. Technically, this leads to large-scale mathematical
problems, which are solved using recent advances in convex optimization. We apply our method for localizing brain areas involved in
different types of motor imagery using real data from Brain-Computer Interface (BCI) sessions. Our approach based on single-trial
localization of complex Fourier coefficients yields class-specific focal sources in the sensorimotor cortices.
Key words: EEG, MEG, inverse problem, basis field, large-scale optimization, motor imagery, brain-computer interfaces
1. Introduction
Measuring electrical field distributions allows to localize
cognitive processing and is thus of high value for neuro-
science research and medical diagnosis. While invasive mea-
surements provide a very local assessment of neuronal acti-
vations, such a procedure is only possible in humans where
electrodes are already implanted for treatment/diagnosis
of neurological diseases, e.g., epilepsy. Noninvasive localiza-
tion techniques based on electro- and magnetoencephalog-
raphy (EEG and MEG) are applicable without restriction
and are therefore highly useful. They have become standard
tools for analyzing fast brain signals such as somatosensory-
evoked potentials (SEPs) or ongoing oscillations. For un-
derstanding the respective cognitive processes spatial pat-
terns (scalp maps) derived from EEG/MEG, however, only
give a rough estimate of the true underlying sources. Thus,
for revealing a more detailed picture a full source recon-
struction is required which involves a mathematical inver-
sion of the (approximately) known mapping from sources
∗ Corresponding author.
Email address: stefan.haufe@tu-berlin.de (Stefan Haufe).
to sensors. Unfortunately, this is an ill-defined inverse prob-
lem since any measurement can be equally well explained
by infinitely many different source distributions.
Therefore, in order to “solve” the inverse problem it is
necessary to impose additional constraints on the solution.
Dipole fits (e.g. Scherg and von Cramon, 1986) and scan-
ning techniques (Schmidt, 1986; Mosher and Leahy, 1999;
Veen and Buckley, 1988; Van Veen et al., 1997) correspond
to directly constraining the number of dipolar sources.
Imaging methods, in contrast, model a large number of
dipoles and thus allow us to estimate activity in the entire
brain at once. Constraints are here imposed by a dedicated
penalty functional reflecting assumptions on the sources.
Perhaps the two most common assumptions are smooth-
ness (Ha¨ma¨la¨inen and Ilmoniemi, 1994; Pascual-Marqui
et al., 1994; Pascual-Marqui, 2002) and focality (Matsuura
and Okabe, 1995; Gorodnitsky et al., 1995; Uutela et al.,
1999; Huang et al., 2006; Ou et al., 2008; Ding and He,
2008; Bolstad et al., 2009), both of which can be motivated
by neurophysiological arguments. Nevertheless both ap-
proaches may deliver implausible results in practice since
“smooth methods” tend to estimate sources that spread
over a considerable part of the brain which is not always
Preprint submitted to NeuroImage 6 October 2010
Page 2
physiologically meaningful. Estimates obtained by “sparse
methods” tend to be unstable and scattered around the
true sources. Two recent studies suggest that estimates
with more plausible extent and shape can be obtained by
encouraging both smoothness and focality of the sources
(Haufe et al., 2008; Vega-Herna´ndez et al., 2008) using
two penalties. Such a hybrid approach has been shown
to outperform purely smooth respectively focal methods
when distinguishing two or three simulated as well as real
sources (Haufe et al., 2008).
In this paper we propose a novel method for EEG/MEG
source reconstruction that achieves a compromise between
smoothness and focality, allowing to model extended
sources. This is achieved by expanding the current density
into a sparse combination of spatial basis fields. Compared
with our previous approach Focal Vectorfield Reconstruc-
tion (FVR, Haufe et al., 2008), the presented method
achieves similarly good localization, while allowing a sim-
pler mathematical formulation. The novel cost function
enables the deployment of a very efficient optimization
scheme by which it becomes possible to solve reconstruc-
tion problems involving orders of magnitude more vari-
ables than previously. These additional variables can be
used to localize larger datasets, or to increase the spatial
resolution.
We will derive the proposed methodology in Section 2.
Section 2 describes the application of our method to two
datasets. The first one is simulated and used for assessing
the source reconstruction quality of our method in com-
parison to FVR and other approaches. The second dataset
consists of EEG responses acquired during sucessful brain-
computer interface (BCI) sessions, where the task was to
modulate local µ-rhythms by means of motor imagery of
different limbs. These data are an ideal testbed for source
reconstruction algorithms, since there exists a strong prior
knowledge about the neurophysiological basis underlying
a good BCI performance. Results on the physical origin
of class-related EEG activity during BCI sessions are pre-
sented in Section 3. Section 4 contains a general discussion
of strengths and potential drawbacks of current distributed
inverse methods and their inherent assumptions, as well as
practical issues regarding regularization. The paper finishes
with concluding remarks in Section 5.
2. Methods and Materials
2.1. Localization using Sparse Basis Field Expansions
In EEG/MEG source reconstruction we are equipped
with measurements of the scalp electrical potential (EEG)
or magnetic field (MEG), from which we would like to in-
fer the generating electrical current density (sources) in the
brain. The EEG/MEG activity is comprised in a vector z =
(z1, . . . , zM )> ∈ CM , where M is the number of sensors.
As the data z could possibly contain responses to Fourier
or wavelet filters it is allowed to take complex values. Let
B ⊂ R3 be the volume covered by the brain (i.e. white and
gray matter). The current density is a vector field y : B→
C3 assigning a (complex) vectorial current source to each
location in the brain. Considering a discrete sample of loca-
tions (voxels) and source currents (xn,y(xn) =: yn), n =
1, . . . , N , we denote by Y = (y>1 , . . . ,y>N )> the N × 3 ma-trix of sources and by vec (Y ) a column vector containing
the stacked transposed rows of Y . The forward mapping
from the sources Y to the measurements z is linear and can
be written as
z = F vec (Y ) (1)
using the lead field matrix F ∈ RM×3N , which can be com-
puted for a known geometry of the head and known con-
ductive properties of brain, skull and skin tissues (Baillet
et al., 2001).
2.1.1. Model
Instead of estimating the currents yn directly, we propose
to model the current density as a linear combination of
(potentially many) spatial basis fields, the coefficients of
which are to be estimated. A basis field is defined here as a
vector field, in which all output vectors point in the same
direction, while the magnitudes are proportional to a scalar
(basis) function b : B→ R. Given a set of functions bl, l =
1, . . . , L (called a “dictionary”), the basis field expansion
reads
y(x) =
L∑
l=1
clbl(x) , (2)
with coefficient vectors cl ∈ C3, l = 1, . . . , L. By including
one complex coefficient for each dimension, we learn orien-
tations and amplitudes as well as phases of the complex cur-
rent vectors in this model. Let C = (c1, . . . , cL)> ∈ CL×3
contain the coefficients and
B =
b1(x1) . . . bL(x1)... . . . ...
b1(xN ) . . . bL(xN )
∈ RN×L (3)
be the basis functions evaluated at all locations xn. The
forward model then reads
z = F vec (BC) . (4)
2.1.2. Sparsity, Rotational Invariance and Phase
Invariance
Solving Eq. (4) for C does not yield a unique solution if
the number of coefficients is larger than the number of elec-
trodes M , which is the common situation. The ambiguity
can be overcome by regularization, i.e., by imposing addi-
tional constraints on the variables. Here, we assume that,
for an appropriately chosen dictionary, the current density
can be well approximated by a small number of basis fields.
This can be achieved by estimating a sparse coefficient ma-
trix C, i.e., a matrix that has mostly zero entries. Besides
the regularizing effect, sparse decompositions also provide
a way of interpreting current densities by looking at the se-
lected basis functions (those having corresponding nonzero
2
Page 3
coefficients in C). The premise for such interpretability is
that the basis functions itself are simple enough, which
should be ensured when designing the dictionary.
An important property of EEG/MEG source reconstruc-
tion algorithms is rotational invariance. That is, the esti-
mated current density should not change when the coordi-
nate system is rotated. This holds in general for `2-norm
a.k.a. Tikhonov regularized methods, which deliver non-
sparse sources/coefficients. However, if sparsity is desired,
additional effort is needed. For example, penalizing the `1-
norm (the sum of absolute values of the entries) of C leads
to a sparse expansion, but not to rotational invariance. The
`1-norm penalty does not couple the three dimensions of
the current density, making it very probable that different
coefficients are set to zero for each of them. This amounts
to selecting different basis functions in each dimension. As
a result, the tendency of `1-norm regularized methods to
favor zero coefficients also creates a bias towards current
orientations that are perpendicular to one or more of the
axes of the coordinate system and are physiologically mean-
ingless.
It has recently been pointed out, that rotational invari-
ance of vectorial quantities can be maintained by choosing
a so-called `1,2-norm penalty, which minimizes the (spar-
sity inducing) `1-norm of vector amplitudes (Ding and He,
2008; Haufe et al., 2008; Ou et al., 2008; Bolstad et al.,
2009). The difference between “standard” `1-norm and the
`1,2-norm is that the former leads to entry-wise sparsity,
while the latter sets whole rows of C jointly to zero. Im-
portantly the chosen coordinate system does not influence
whether or not a row is set to zero by the `1,2-norm, while
it does affect the pruning of entries by the `1-norm. For
a geometrical explanation of why `1- and `1,2-norm penal-
ties lead to sparsity at all we refer to Tibshirani (1996) and
Yuan and Lin (2006).
The `1,2-norm regularizer is defined by
R(C) = ‖C‖1,2 =
L∑
l=1
‖cl‖2 . (5)
Technically,R(C) is rotationally invariant due to the use
of the `2-norm
‖c‖2 =
√
√
√
√
3∑
d=1
c2d (6)
in output space, which does not change under rotation. Let
Q ∈ C3×3, Q†Q = I be a unitary matrix, where Q† is the
adjoint of Q. Now
L∑
l=1
‖Qcl‖2 =
L∑
l=1
√
tr(c†lQ†Qcl) =
L∑
l=1
‖cl‖2 . (7)
Note that the class of unitary matrices covers both rota-
tions Q ∈ R3×3, Q>Q = I as well as phase shifts Q =
exp(iφ)I(3×3) as special cases.
2.1.3. Dictionary
The idea of enforcing smoothness and focality in the in-
verse solution is to avoid the scattering of activity found
for many purely focal approaches, while at the same time
to maintain their high spatial resolution and the associated
ability to distinguish multiple sources. In other words, we
are looking for source estimates with spatially constricted
but smooth active regions. In Haufe et al. (2008) a com-
bination of two penalties was used to achieve that effect.
Here, it is addressed by designing an appropriate basis func-
tion dictionary. We consider an expansion into Gaussians.
These are smooth, but also well localized due to exponen-
tially decaying tails. Thanks to the latter, sparse combina-
tions of Gaussian bases give rise to good spatial separation
of sources. Using a redundant dictionary containing Gaus-
sians of different scales, we further expect that sources with
arbitrary shape can be reconstructed with few basis ele-
ments.
Formally, we consider spherical Gaussians
bn,s(x) =
(√2piσs
)−3 exp
(
−
1
2 ‖x− xn‖
2
2 σ−2s
)
(8)
being centered at nodes xn, n = 1, . . . , N and having S
different spatial standard deviations σs, s = 1, . . . , S (see
Fig. 1 for examples).
Fig. 1. Examples of Gaussian basis functions bn,s(x) with spatial standard deviations
σ1 = 0.5 cm, σ3 = 1 cm and σ5 = 1.5 cm.
35
Fig. 1. Examples of Gaussian basis functions bn,s(x) with spatial
standard deviations σ1 = 0.5 c , σ3 = 1 cm and σ5 = 1.5 cm.
2.1.4. Normalization
The proposed `1,2-norm based regularization aims at se-
lecting the smallest possible number of basis fields neces-
sary to explain the measurement. This approach, however,
is heuristic, since not the number of nonzero coefficient
vectors, but their magnitudes enter the cost function. It is
therefore important to normalize the basis functions in or-
der not to prefer some of them a-priori. LetBs be theN×N
matrix containing all basis function evaluations with stan-
dard deviation σs. The large matrix
3
Page 4
B =
( B1
‖vec (B1) ‖1 , . . . ,
BS
‖vec (BS) ‖1
)
∈ RN×SN (9)
is constructed using normalized Bs. By this means, no
length scale is preferred a-priori.
An estimation bias is also introduced by the location of
the sources. Due to volume conduction, the signal captured
by the sensors is much stronger for superficial sources com-
pared to deep sources. In Pascual-Marqui (2002) the vari-
ance estimate Sˆ = F¯> (F¯ F¯>)−1 F¯ ∈ R3N×3N is derived
for the (least-squares) estimated sources, where F¯ = HF
and H = I(M×M) − 1(M)1>(M)/M is the common-average
reference transform. We found that Sˆ can be used for alle-
viating the location bias (Haufe et al., 2008). This can be
done by penalizing activity at locations with high variance.
Let Wn ∈ R3×3 denote the inverse of the matrix square
root of the n-th 3 × 3 blockdiagonal part of Sˆ, we define
the depth-compensation matrix
W =
(W1 . . . 0... . . . ...0 . . . WN
)
∈ R3N×3N . (10)
2.1.5. Estimation
Using the definitions from above the coefficients are
sought which provide a defined compromise between spar-
sity and model error, i.e.
Cˆ = arg min
C
R(C) + λL(C) (11)
where L(C) = ‖z−Γvec (C) ‖22 is the quadratic loss func-tion, Γ ≡ FW (B ⊗ I(3×3)) ∈ RM×3SN and λ is a positive
constant controlling the tradeoff between loss function and
regularization. Minimizing the weighted sum of two objec-
tives is a measure to achieve a compromise between the
two (cf. Zou and Hastie, 2005; Haufe et al., 2008; Vega-
Herna´ndez et al., 2008).
Given the coefficients the estimated current density at
node xn is defined by
yˆn = Wn
SN∑
l=1
cˆlbl(xn) . (12)
This solution has been termed sparse basis field expansion
(S-FLEX) solution in a precursory conference paper (Haufe
et al., 2009).
2.1.6. Comparison to Focal Vectorfield Reconstruction
Note that Eq. (11) has a structural similarity to our pre-
vious approach FVR (Haufe et al., 2008). The FVR solution
is obtained by setting B = I(N×N) (i.e., the coefficients ci
are equal to the sources si) and adding the additional regu-
larizer α∑Nn=1 ‖tn‖2, where T = (t1, . . . , tN )> = DW−1Cand D is a discrete spatial second derivative (Laplacian)
operator. The additional term in FVR effectively enforces
spatial smoothness or continuity of the current density by
rewarding sparse second derivatives. Hence, FVR and our
current approach achieve a very similar effect using con-
trary strategies (namely, sparsity before and after linear
transformation). However, it is not possible to transform
one problem into the form of the other, since B and D are
generally not invertible. As we will see later (Section 2.1.8),
this prevents our hereby proposed optimization algorithm
to be applied to FVR, making generalization of FVR to
large-scale scenarios harder.
2.1.7. Extension to Multiple Measurements
While Eq. (12) considers only single field patterns, we
would now like to extend S-FLEX to the localization of
multiple measurements. The goal is to estimate T cur-
rent densities yn(t) based on T patterns z(t). Let Z =
(z(1), . . . , z(T )) ∈ CM×T and cl(t) ∈ C3 be the coefficient
vector describing the contribution of the l-th basis field to
the t-th pattern. Defining c˜l = (cl(1)>, . . . , cl(T )>)> ∈
R3T and
C˜ =
c1(1) . . . c1(T )... . . . ...cL(1) . . . cL(T )
∈ R3L×T , (13)
we propose to estimate
ˆ˜C = arg min
C˜
R˜(C˜) + λL˜(C˜) (14)
with R˜(C˜) = ∑Ll=1 ‖c˜l‖2 and L˜(C˜) =
∥
∥
∥vec
(
Z − ΓC˜
)∥
∥
∥
2
2
,
which is equivalent to Eq. (11) for T = 1. However, for T >
1 it is not equivalent to solving T problems of type Eq. (11)
separately, as in our case the 3T coefficients belonging to
a certain basis function are tied under a common `2-norm
penalty and can only be pruned to zero at the same time.
Thus, the selection of basis functions which contribute co-
herently to several patterns is facilitated, while at the same
time orientations, amplitudes and phases of the correspond-
ing fields are allowed to differ per pattern. Such joint (or
co-) localization was already suggested in previous work.
The idea originates from Polonsky and Zibulevsky (2004)
and appears also in Malioutov et al. (2005), Wipf and Rao
(2007), Ou et al. (2008) and Bolstad et al. (2009). Malioutov
et al., Ou et al. and Bolstad et al. (2009) use the technique
for spatio-temporal source localization, where the `2-norm
penalty in temporal domain prevents from artificial jumps
in the time course of the estimated sources. Both stud-
ies suggest that joint localization leads to a better noise
suppression compared to the single-timepoint estimator. A
similar effect has been reported in a pure regression setting,
where joint regularization of Fourier coefficients lead to im-
proved BCI classification rates (van Gerven et al., 2009).
2.1.8. Optimization
Eqs. (11) and (14) form convex problems, composed of a
quadratic loss function and a convex nondifferentiable regu-
larizer. These problems share similarities with the problems
discussed in Polonsky and Zibulevsky (2004); Haufe et al.
(2008); Malioutov et al. (2005); Ou et al. (2008); Ding and
He (2008); Wipf and Nagarajan (2009) and Bolstad et al.
(2009). In the majority of these papers, the cost function is
4
Page 5
reformulated as an instance of second-order cone program-
ming (SOCP) (Lobo et al., 1998). The proposed interior-
point-based SOCP solvers are, however, only applicable to
small- and medium-sized problems not exceeding several
ten thousands of variables. For this reason, some authors
perform a dimensionality reduction step in order to reduce
the number of variables and/or observations (Malioutov
et al., 2005; Ou et al., 2008).
Here, we make use of a more recent advance in numeri-
cal optimization that enables us to solve S-FLEX instances
involving millions of model parameters and thousands of
observations. The proposed algorithm is based on deriving
the Fenchel dual of the optimization problem and apply-
ing the augmented Lagrangian technique. It has thus been
termed Dual Augmented Lagrangian (DAL, see Tomioka
and Sugiyama (2009)). Usage of augmented Lagrangians
was also proposed by Polonsky and Zibulevsky (2004) who
apply the technique to the primal (original) problem. How-
ever, it is shown in Tomioka and Sugiyama (2009) that a
dual formulation is more efficient when the number of un-
known variables is much larger than the number of obser-
vations, which is the typical scenario in distributed source
modeling.
We use the reference implementation of DAL, which is
provided as open source software (Tomioka, 2009). Note
that DAL is not only suitable for computing S-FLEX so-
lutions, but could also be applied to solve large instances
of the problem arising in Polonsky and Zibulevsky (2004);
Ou et al. (2008); Ding and He (2008); Bolstad et al. (2009).
Unfortunately, sparsity of linearly transformed variables is
not efficiently handled by DAL, preventing our previous
approach FVR to benefit from DAL. For this reason, we
believe, that our current formulation is more suitable for
achieving spatial flexibility in large-scale source localiza-
tion tasks.
2.2. Simulations
2.2.1. Assessing Single-Measurement Localization
Performance
Validation of methods for inverse reconstruction is gen-
erally difficult due to the lack of a “ground truth”. The
measurements z do not provide such a truth, as the main
goal here is not to find a functional representation for the
EEG, but for the underlying current density y(x), which
is unknown. Therefore, a standard way of evaluating in-
verse methods is to assess their ability to reconstruct known
functions. This is done here by reconstructing simulated
current sources, which are generated as follows. A realistic
head model is obtained from high-resolution MRI (mag-
netic resonance imaging) slices of a human head (Holmes
et al., 1998). Inside the brain, N = 2142 dipole locations
xn, n = 1, . . . , N are defined according to a cubic grid
of 10mm inter-dipole distance. Corresponding current vec-
tors yn are sampled from a multivariate standard normal
distribution. The resulting function (xn,yn) is spatially
smoothed using a Gaussian lowpass filter with standard
deviation 2.5 cm. Finally, denoting by pk the k-th per-
centile of the current lengths ‖yn‖2, n = 1, . . . , N , each
yn is scaled to have length max(‖yn‖2 − p90, 0), i.e., only
the 10% largest currents are retained. Source distributions
obtained by this procedure usually feature two-three ac-
tivity patches (sources) with small to medium extent and
smoothly varying magnitude and orientation (see Fig. 2 for
an example). The lead field F ∈ RM×2142·3 is constructed
according to Nolte and Dassios (2005) taking into account
the realistic head geometry.
The localization is carried out using the proposed sparse
basis field expansion (S-FLEX) approach, the commonly
used approaches of LORETA (Pascual-Marqui et al., 1994),
minimum `1-norm estimate (denoted as L1 in the follow-
ing) (Matsuura and Okabe, 1995), and our recently pro-
posed Focal Vectorfield Reconstruction (FVR) technique
(Haufe et al., 2008). Note that these methods cover the full
spectrum from smooth spread-out solutions (LORETA) to
sparse solutions (L1). We use a variant of L1, in which the
original depth compensation approach is replaced by the
approach outlined in Section 2.1.4. As the data was simu-
lated without noise, perfect reconstruction is required for
all methods. For S-FLEX, basis functions with three dif-
ferent standard deviations σ1 = 0.5 cm, σ2 = 1 cm, σ3 =
1.5 cm are used. The tradeoff parameter α for FVR is cho-
sen as suggested in Haufe et al. (2008).
Five current densities are simulated and respective
pseudo EEG measurements for 118 channels are com-
puted. For each measurement and method a 5 × 5-fold
cross-validation is conducted. That is, the EEG electrodes
are randomly partitioned into five groups of approximately
equal size. Each union of four electrode groups gives rise
to a “training set”, while the remaining channel groups
are called “test sets”. The procedure is carried out five
times with different randomizations, yielding 25 training
sets with corresponding test sets. Inverse reconstructions
are carried out based on the “training sets”. In each of the
25 cross-validation runs, two criteria are evaluated. Most
importantly the reconstruction error, defined as
REC =
∥
∥
∥
∥
∥
∥
vec (Y )
‖vec (Y ) ‖2 −
vec
(
Yˆ tr
)
‖vec
(
Yˆ tr
)
‖2
∥
∥
∥
∥
∥
∥
2
, (15)
is considered, where Yˆ tr are the vector field outputs at
nodes xn, n = 1, . . . , N estimated using only the training
set. Apart from the pointwise reconstruction, we also con-
sider the earth-mover’s distance (EMD) between true and
estimated current density, which measures the effort needed
to transform one density into the other. The EMD is de-
scribed in Rubner et al. (2000) and has been introduced
in the context of EEG/MEG inverse solution evaluation in
Haufe et al. (2008).
A third quantity of interest is the generalization error,
i.e., the error in predicting the activity at those channels in
the test set from the sources that are estimated from the
5
End of preview.