Page 1
Time-Frequency Mixed-Norm Estimates: Sparse
M/EEG imaging with non-stationary source activations
A. Gramfort1,2,3,4,⇤, D. Strohmeier5, J. Haueisen5,6, M. H¨ am¨ al¨ ainen4, M.
Kowalski7
Abstract
Magnetoencephalography (MEG) and electroencephalography (EEG) al-
low functional brain imaging with high temporal resolution. While solving
the inverse problem independently at every time point can give an image of
the active brain at every millisecond, such a procedure does not capitalize
on the temporal dynamics of the signal. Linear inverse methods (Minimum-
norm, dSPM, sLORETA, beamformers) typically assume that the signal is
stationary: regularization parameter and data covariance are independent
of time and the time varying signal-to-noise ratio (SNR). Other recently
proposed non-linear inverse solvers promoting focal activations estimate the
sources in both space and time while also assuming stationary sources during
a time interval. However such an hypothesis only holds for short time in-
tervals. To overcome this limitation, we propose time-frequency mixed-norm
estimates (TF-MxNE), which use time-frequency analysis to regularize the
ill-posed inverse problem. This method makes use of structured sparse priors
⇤Institut Mines-Telecom, Telecom ParisTech, CNRS LTCI, 37-39 Rue Dareau, 75014
Paris, France
Email address: alexandre.gramfort@telecom-paristech.fr (A. Gramfort)
1Institut Mines-Telecom, Telecom ParisTech, CNRS LTCI, Paris, France
2INRIA, Parietal team, Saclay, France
3NeuroSpin, CEA Saclay, Bat. 145, 91191 Gif-sur-Yvette Cedex, France
4Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospi-
tal, and Harvard Medical School, Charlestown MA, USA
5Institute of Biomedical Engineering and Informatics, Ilmenau University of Technol-
ogy, Ilmenau, Germany
6Biomagnetic Center, Dept. of Neurology, University Hospital Jena, Jena, Germany
7Laboratoire des Signaux et Syst` emes (L2S), Supelec-CNRS-Univ Paris-Sud, Plateau
de Moulon, 91192 Gif-sur-Yvette Cedex, France
Preprint submitted to ElsevierJanuary 12, 2013
Page 2
defined in the time-frequency domain, o↵ering more accurate estimates by
capturing the non-stationary and transient nature of brain signals. State-
of-the-art convex optimization procedures based on proximal operators are
employed, allowing the derivation of a fast estimation algorithm. The accu-
racy of the TF-MxNE is compared to recently proposed inverse solvers with
help of simulations and by analyzing publicly available MEG datasets.
Keywords:
Electroencephalography (EEG), sparse structured priors, convex
optimization, time-frequency, algorithms
Inverse problem, Magnetoencephalography (MEG),
1. Introduction
Distributed source models in magnetoencephalography and electroen-
cephalography (collectively M/EEG) use thousands of current dipoles that
are used as candidate sources to explain the M/EEG measurements. Those
dipoles can be located on a dense three-dimensional grid within the brain vol-
ume, typically every 5 mm, or over a surface of the segmented cortical man-
tle [7], both of which can be automatically segmented from high-resolution
anatomical Magnetic-Resonance Images (MRIs). Following Maxwell’s equa-
tions, each dipole adds its contribution linearly to the measured signal. Note
that this linearity of the forward problem is not a modeling assumption but
a fact based on the fundamental physics of the problem.
The task in the inverse problem is to map the M/EEG measurements to
the brain, i.e., to estimate the distribution of dipolar currents that can explain
the measured data. Inverse methods that estimate distributed sources are
commonly referred to as imaging methods. This is motivated by the fact that
the current estimate explains the data and can be visualized as an image, at
least at a given point in time. The orientations of the dipoles can be either
considered to be known, e.g., by aligning them with the estimated cortical
surface normals [7], in which case only the dipole amplitudes need to be
estimated. Alternatively, the orientations can be considered as unknown in
which case both amplitudes and orientations need to be estimated at each
spatial location.
One of the challenges for distributed inverse methods is that the num-
ber of dipoles by far exceeds the number of M/EEG sensors: the problem
is ill-posed. Therefore, constraints using a priori knowledge based on the
characteristics of the actual source distributions are necessary. Common
2
Page 3
priors are based on the Frobenius norm and lead to a family of methods
generally referred to as mininum norm estimators (MNE) [45, 19]. Minimum
norm estimates can be converted into statistical parameter maps, which take
into account the noise level, leading to noise-normalized methods such as
dSPM [6] or sLORETA [35]. While these methods have some benefits like
simple implementation and a good robustness to noise, they do not take into
account the natural assumption that only a few brain regions are typically
active during a cognitive task. Interestingly, this latter assumption is what
justifies a parametric method known as “dipole fitting” [37] routinely used
in clinical practice. In order to promote such focal or sparse solutions within
the distributed source model framework, one uses sparsity-inducing priors
such as a `pnorm with p 1 [30, 14]. However, with such priors it is chal-
lenging to obtain consistent estimates of the source orientations [42] as well
as temporally coherent source estimates [34].
In order to promote spatio-temporally coherent focal estimates, several
publications have proposed to constrain the active sources to remain the same
over the time interval of interest [34, 11, 46, 15]. The implicit assumption
is then that the sources are stationary. While this conjecture is reasonable
for short time intervals, it is not a good model for realistic sources config-
urations where multiple transient sources activate sequentially during the
analysis period, or simultaneously, before returning to baseline at di↵erent
time instants.
When working with time series with transient and non-stationary e↵ects,
relevant signal processing tools are short time Fourier transforms (STFT)
and wavelet decompositions. Contrary to a simple Fast Fourier Transform
(FFT), they provide information localized in time and frequency (or scale). In
particular, time-frequency decompositions, e.g., Morlet wavelet transforms,
are routinely used in MEG and EEG analysis to study transient oscillatory
signals. Such decompositions have been employed to analyze both sensor-
level data and source estimates, but no attempt has been made to use their
output in constructing a regularizer for the inverse problem.
In this contribution, we address the problem of localizing non-stationary
focal sources from M/EEG data using appropriate sparsity inducing norms.
Extending the work from [15] in which we coined the term Mixed-Norm Es-
timates (MxNE), we propose to use mixed-norms defined in terms of the
time-frequency decompositions of the sources. We call this approach the
Time-Frequency Mixed-Norms Estimates (TF-MxNE). The benefit is that
the estimates can be obtained over longer time intervals while making stan-
3
Page 4
dard preprocessing such as filtering or time-frequency analysis on the sensors
optional. The inverse problem is formulated as in [15] as a convex optimiza-
tion problem whose solutions are computed with an e?cient solver based on
proximal iterations.
We start with a detailed presentation of the problem and the algorithm.
Next, we compare the characteristics and performance of various priors with
help of realistic simulated data. Finally, we analyze publicly available MEG
datasets (auditory and visual stimulations) demonstrating the benefit of TF-
MxNE in terms of source localization and estimation of the time courses of
the sources.
A preliminary version of this work was presented at the international
conference on Information Processing in Medical Imaging (IPMI) [17]. In
this paper we improve the solver to support loose orientation constraints,
depth compensation as well as a debiasing step to better estimate source
amplitudes. We also analyze new experimental data.
Notation: We indicate vectors with bold letters, a 2 RN(resp. CN)
and matrices with capital bold letters, A 2 RN⇥N(resp. CN⇥N). a[i] stands
for the ithentry in the vector, while A[i,·] and A[·,i] denote the ithrow
and ithcolumn of a matrix, respectively. We denote kAkFrothe Frobenius
norm, kAk2
kAk21=PN
Fro=PN
i=1
i,j=1|A[i,j]|2, kAk1 =PN
i,j=1|A[i,j]| the `1norm, and
qPN
j=1|A[i,j]|2the `21mixed norm. ATand AHstand for
the matrix transpose and a Hermitian transpose, respectively.
2. General model and method
After a short introduction to Gabor time-frequency dictionaries for M/EEG
signals, we present the details of our TF-MxNE inverse problem approach.
We then detail the proposed optimization strategy, which uses proximal it-
erations.
2.1. Gabor dictionaries
Here we briefly present some important properties of Gabor dictionaries,
see [8] for more details. Given a signal observed over a time interval, its
conventional Fourier transform estimates the frequency content but loses the
time information. To analyze the evolution of the spectrum with time and
hence the non-stationarity of the signal, Gabor introduced windowed Fourier
atoms which correspond to a short-time Fourier transform (STFT) with a
4
Page 5
Gaussian window. In practice, for numerical computation, a challenge is to
properly discretize the continuous STFT. The discrete STFT with a Gaussian
window is also known as the discrete Gabor Transform [12].
The setting we are considering is the finite-dimensional one. Let g 2 RT
be a “mother” analysis window. Let f02 N and k02 N be the frequency and
the time sampling rate in the time-frequency plane generated by the STFT,
respectively. The family of the translations and modulations of the mother
window generates a family of Gabor atoms (?mf)mf forming the dictionary
? 2 CT⇥K, where K denotes the number of atoms. The atoms can be written
as
m 2 {0,...,T?mf[n] = g[n?mk0]e
i2⇡f0fn
T
,
k0?1},f 2 {0,...,T
f0?1} . (1)
If the product f0k0is small enough, i.e., the time-frequency plane is su?-
ciently sampled, the family (?mf)mf is a frame of RT, i.e., one can recover
any signal x 2 RTfrom its Gabor coe?cients (hx,?mfi) = ?Hx. More
precisely, there exists two constants A,B > 0 such that [1]:
X
When A = B, the frame is tight. When the vectors ?mfare normalized
the frame is an orthogonal basis if and only if A = B = 1. The Balian-
Low theorem says that it is impossible to construct a Gabor frame which
is a basis. Consequently, a Gabor transform is redundant or overcomplete
and there exists an infinitely number of ways to reconstruct x from a given
family of Gabor atoms. In the following, the considered ? dictionaries are
tight frames.
The canonical reconstruction of x from its Gabor coe?cients requires a
canonical dual window, denoted by ˜ g. Following (1) to define (˜?mf)mf we
have:
X
where˜? is the Gabor dictionary formed with the dual windows. When
the frame is tight, then we have ˜ g = g, and more particularly we have
??H= k??HkId8. The representation being redundant, for any x 2 RT
Akxk2
2
m,f
|hx,?mfi|2 Bkxk2
2. (2)
x =
m,f
hx,?mfi˜?mf=
X
m,f
hx,˜?mfi?mf= ?Hx˜? =˜?Hx? ,
8We can however say nothing about ?H? in general.
5
Page 6
one can find a set of coe?cients zmfsuch that x =P
it is particularly interesting for M/EEG to find a sparse representation of the
signal. Indeed, a scalogram, sometimes simply called TF transform of the
data in the MEG literature, generally exhibits a few peaks localized in the
time-frequency domain. In other words, an M/EEG signal can be expressed
as a linear combinations of a few oscillatory atoms. In order to demonstrate
this, Fig. 1 shows the STFT of a single planar gradiometer channel MEG
signal from a somatosensory experiment, the same STFT restricted to the
50 largest coe?cients (approximately only 10% of the coe?cients), and the
signal reconstructed with only these coe?cients compared to the original
signal. We observe that the true signal can be well approximated by only a
few coe?cients, i.e., a few Gabor atoms. In the presence of white Gaussian
noise, restricting the time-frequency representation of a signal to the largest
coe?cients denoises the data. This stems from the fact, that Gaussian white
noise in not sparse in the time-frequency domain, but rather spreads energy
uniformly over all time-frequency coe?cients [40]. Thresholding or shrinking
the coe?cients therefore reduces noise and smoothes the data. This is further
explained in the context of wavelet transforms in [9].
m,fzmf?mf, while the
zmfverify some suitable properties dictated by the application. For example,
(a) STFT
(b) STFT (50 coef.)
(c) MEG data
Figure 1: a) Short-time Fourier transform (STFT) of a single channel MEG signal sampled
at 1000 Hz showing the sparse nature of the transformation (window size 64 time points
and time shift k0= 16 samples). b) STFT restricted to the 50 largest coe?cients c) Data
and data reconstructed using only the 50 largest coe?cients.
In practice, the Gabor coe?cients are computed using the Fast Fourier
Transform (FFT) and not by a multiplication by a ? matrix as suggested
above. Such operations can be e?ciently implemented as in the LTFAT
toolbox9[38]. Another practical concern to keep in mind is the tradeo↵
9http://ltfat.sourceforge.net/
6
Page 7
between the size of the window g and the time shift k0. A long window will
have a good frequency resolution and a limited time resolution. The time
resolution can be improved with a small time shift leading however to a larger
computational cost, both in time and memory. Finally, as any computation
done with an FFT, the STFT implementations assume circular boundary
conditions for the signal. To take this into account and avoid edge artifacts,
the signal has to be windowed, e.g., using a Hann window.
2.2. The inverse problem with time-frequency dictionaries
The linearity of Maxwell’s equations implies that the signals measured
by M/EEG sensors are linear combinations of the electromagnetic fields pro-
duced by all current sources. The linear forward operator, called gain matrix,
predicts the M/EEG measurements due to a configuration of sources based
on a given volume conductor model [32]. Given such a linear forward op-
erator G 2 RN⇥P, where N is the number of sensors and P the number
of sources, the measurements M 2 RN⇥T(T number of time instants) are
related to the source amplitudes X 2 RP⇥Tby M = GX.
The computation of the gain matrix G, e.g., with a Boundary Element
Method (BEM) [24, 16], requires modeling of the electromagnetic proper-
ties of the head [19] such as the specification of the tissue conductivities.
The matrix is then numerically computed. In the inverse problem one com-
putes a best estimate of the neural currents, X?, based on the measurements
M. However, since P ? N, the problem is ill-posed and priors need to
be imposed on X. Historically, the sources amplitudes were computed time
instant by time instant using priors based on `pnorms. The `2(Frobenius)
norm leads to MNE, LORETA, dSPM, or sLORETA while several alter-
native solvers based on `p norms with p 1 have also been proposed to
promote sparse solutions [30, 14]. However, since such solvers work on an
instant by instant basis they do not model the oscillatory nature of electro-
magnetic brain signals. Note that even if the `2norm based methods work
time instant by time instant, the estimates reflect the temporal characteris-
tics of the data, since they are obtained by linear combinations of sensor data.
This, however, implies that the parameters of the inverse solver are indepen-
dent of time, which corresponds to assuming that the SNR is independent of
time. Although MNE type approaches have been used with success, the as-
sumption of constant SNR is clearly wrong since the signal amplitudes vary
in time while the noise stays constant, or may be even smaller during an
evoked response. The noise is usually estimated from baseline periods such
7
Page 8
as prestimulus intervals or periods when the brain is not yet responding to
the stimulus.
Beyond single instant solvers, various sparsity-promoting approaches have
been proposed [34, 11, 46]. Although they manage to capture the time courses
of the activations more accuratly than the instantaneous sparse solvers, they
implicitly assume that the active sources are the same over the entire time
interval of interest. This also implies that if a source is detected as active at
one time point, its activation will be non-zero during the entire time interval
of interest. To go beyond this approach, we propose a solver which promotes
on the one hand that the source configuration is spatially sparse, and on the
other hand that the time course of each active dipole is a linear combination
of a limited number of Gabor atoms, as suggested by Fig. 1. Since a Gabor
oscillatory atom is localized in time, sources can be marked as active only
during a short time period. The model reads:
M = GX + E = GZ?H+ E ,(3)
where ?H2 CK⇥Tis a dictionary of K Gabor atoms, Z 2 CP⇥Kare the co-
e?cients of the decomposition, and E is additive white noise, E ⇠ N(0,?I).
Given a prior on Z, P(Z) ⇠ exp(?⌦(Z)), the maximum a posteriori estimate
(MAP) is obtained by solving:
Z?= argmin
Z
1
2kM ? GZ?Hk2
Fro+ ?⌦(Z) , ? > 0 . (4)
If we consider ⌦(Z) = kZk1, (4) corresponds to a LASSO problem [39],
a.k.a. Basis Pursuit Denoising (BPDN) [4], where features (or regressors)
are spatio-temporal atoms. Similarly to the original formulation of MCE,
i.e., `1 regularization without applying ?, such a prior is likely to su↵er
from inconsistencies over time [34]. Indeed such a norm does not impose
a structure for the non-zero coe?cients: they are likely to be scattered all
over Z?(see Fig. 2). Therefore, simple `1priors do not guarantee that only
a few sources are active during the time window of interest. To promote
this, one needs to employ mixed-norms such as the `21norm [34, 15]. By
doing so, the estimates have a sparse row structure (see Fig. 2). However the
`21prior on Z does not produce denoised time series as it does not promote
source estimates that are formed by a sum of a few Gabor atoms. In order
to recover the sparse row structure, while simultaneously promoting sparsity
of the decompositions, we propose to use a composite prior formed by the
8
Page 9
Time or TF coefficient
Space
`2
Time or TF coefficient
Space
`1
Time or TF coefficient
Space
`21
Time or TF coefficient
Space`21+ `1
Figure 2: Sparsity patterns promoted by the di↵erent priors: `2all non-zero, `1scattered
and unstructured non-zero, `21block row structure, and `21+`1block row structure with
intra-row sparsity. Red color indicates non-zero coe?cients.
sum of `21and `1norms. The prior then reads:
?⌦(Z) = ?spacekZk21+ ?timekZk1 , ?space> 0,?time> 0 .
A large regularization parameter ?spacewill lead to a spatially very sparse
solution, while a large regularization parameter ?timewill promote sources
with smooth times series. This is due to the uniform spectrum of the noise
(see Section 2.1) and the fact that a large ?timewill promote source activations
made up of few TF atoms, each of which has a smooth waveform.
(5)
2.3. Optimization strategy
The optimization strategy, which we propose for minimizing the cost
function in (4), is based on the Fast Iterative Shrinkage Thresholding Al-
gorithm (FISTA) [2], a first-order schemes that handles the minimization of
any cost function F that can be written as a sum of two terms: a smooth
convex term f1 with Lipschitz gradient and a convex term f2, potentially
non-di↵erentiable: F(Z) = f1(Z)+f2(Z). In order to apply FISTA, we need
to be able to compute the so-called proximity operator associated with f2,
i.e., the proximity operator associated with the composite `21+`1prior [17].
Definition 1 (Proximity operator). Let ' : RM! R be a proper convex
function. The proximity operator associated to ', denoted by prox': RM!
RMreads:
prox'(Z) = argmin
V2RM
While the proximity operators of mixed-norms relevant for M/EEG can
be found in [15], in the case of the composite prior in (5), the proximity
operator is given by the following lemma.
1
2kZ ? Vk2
2+ '(V) .
9
Page 10
Lemma 1 (Proximity operator for `21+ `1). Let Y 2 CP⇥Kbe indexed
by a double index (p,k). Z = prox?k.k1+µk.k21)(Y) 2 CP⇥Kis given for each
coordinates (p,k) by
Z[p,k] =
Y [p,k]
|Y [p,k]|(|Y [p,k]| ? ?)+
1 ?
µ
pP
k(|Y [p,k]| ? ?)2
+
!
+
.
where for x 2 R, (x)+= max(x,0) , and by convention0
This result is a corollary of the proximity operator derived for hierarchical
group penalties recently proposed in [23]. The penalty described here can
indeed be seen as a 2-level hierarchical structure, and the resulting proximity
operator reduces to successively applying the `1and `21proximity operator.
Both of these proximity operators are discussed in detail in [15].
The pseudo code is provided in Algorithm 1. The Lipschitz constant L of
the gradient of the smooth term in (4) is given by the square of the spectral
norm of the linear operator Z ! GZ?H. We estimate it with the power
iteration method.
0= 0 .
Algorithm 1 FISTA with TF dictionaries to minimize 4
Input: Measurements M, gain matrix G, regularization parameter ? > 0
and I the number of iterations.
Output: Z?
1: Auxiliary variables : Y and Zo2 RP⇥K, and ⌧ and ⌧o2 R.
2: Estimate the Lipschitz constant L with the power iteration method.
3: Y = Z?= Z, ⌧ = 1, 0 < µ < L?1
4: for i = 1 to I do
5:
Zo= Z?
6:
Z?= proxµ?⌦
7:
⌧o= ⌧
8:
2
9:
Y = Z?+⌧o?1
10: end for
?Y + µGT(M ? GY?H)??
⌧ =1+p1+4⌧2
⌧(Z?? Zo)
3. Specific modeling for M/EEG inverse problem
The M/EEG literature has shown that general solvers of the statistics lit-
erature need to be adapted to the specificities of the M/EEG inverse problem.
10
Page 11
Crucial steps in the computation of the source estimates are noise whiten-
ing, depth compensation, handling of source orientations, and amplitude bias
correction.
3.1. Spatial whitening
The model in (3) assumes that the additive noise is Gaussian white with
E ⇠ N(0,?I).
whitening step that relies on estimating the noise covariance matrix. For
this purpose, baseline data is employed, which is recorded while the subject
is at rest e.g. during pre-stimulus periods. If only MEG is recorded, the noise
covariance can be estimated from data recorded without subject, often called
empty room data. This approach provides good estimates of the measure-
ment noise level. Although the noise level depends on the signal frequency,
one usually uses a single frequency-unspecific noise covariance matrix. An
alternative approach for frequency-dependent spatial whitening is presented
in [36].
The whitening step is particularly fundamental when di↵erent sensor
types are used: EEG and MEG with gradiometers and magnetometers record
signals with di↵erent units of measure and with di↵erent noise levels. The
whitening step makes data recorded by di↵erent sensors comparable and
adapted for joint estimation.
This strong modeling assumption is made realistic by a
3.2. Source models with unconstrained orientations
When the source orientations given by the normals of the cortical mesh
cannot be trusted, it can be interesting to relax this constraint by placing
three orthogonal sources at each spatial location. When all three orientations
are allowed to explain the data equivalently the model is called free orien-
tation. Moreover, it can be of interest to have intermediate models using
loose orientation constraints [27]. However for such loose and free orienta-
tion models, the TF composite prior needs to be adapted. Let each source
be indexed by a spatial location i and an orientation o 2 {1,2,3}. Let o = 1
correspond to the orientation normal to the cortex, and o = 2 and o = 3 the
two tangential orientations. We call 0 < ⇢ 1, the parameter controlling
how loose is the orientation constraint. The `1and `21norms read:
r
kZk1=
X
i,k
|Z[(i,o = 1),k]|2+1
⇢|Z[(i,o = 2),k]|2+1
⇢|Z[(i,o = 3),k]|2
11
Page 12
kZk21=
X
i
sX
k
|Z[(i,o = 1),k]|2+1
⇢|Z[(i,o = 2),k]|2+1
⇢|Z[(i,o = 3),k]|2,
where k indexes the TF coe?cients. When ⇢ = 1 the orientation is free
and it amounts to grouping the orientations in a common `2norm such as
in [34, 20]. Such priors are a principled way of supporting loose orientation
constraints in the context of non-`2priors.
Observe here that kZk1 is not an `1 norm per se. Indeed, it is a `21
norm, but we have chosen to keep the same notations as in the constrained
orientation case for the sake of readability.
In practice, using free orientation models means that at a given location,
the current dipoles selected to explain the data can have an orientation that
varies in time similarly to the rotating dipole model employed in dipole fitting.
3.3. Depth compensation
The principal contribution to M/EEG data comes from superficial corti-
cal gray matter: deep sources are attenuated due to their larger distance from
the sensors. While it is common in statistics to scale the columns of the gain
matrix such that kGik2= 1, practice with M/EEG data shows that it is often
not a good idea. The rationale in statistics is to avoid favoring regressors, here
sources, just due to the amplitude of the corresponding column in the gain
matrix. When doing this for M/EEG, it tends to favor too much very deep
sources which are less likely to be visible with M/EEG. For this reason, a com-
mon practice with MNE type approaches is to use a softer depth bias com-
pensation. Given a parameter 0 ? 1, the three columns (G[·,(i,o = 1)],
G[·,(i,o = 2)], G[·,(i,o = 3)]) of G for the three orientations at the same lo-
cation are normalized byp(kG[·,(i,o = 1)]k2
scaling which may lead to spurious deep sources appearing in the results.
2+ kG[·,(i,o = 2)]k2
2+ kG[·,(i,o = 3)]k2
2)?.
If ? = 0 it corresponds to no depth bias compensation and ? = 1 leads to full
3.4. Source weighting: fMRI priors?
Mixed-norm regularizations [15] can be written with spatially dependent
scalar weights. It can be used to promote some sources by reducing their
regularization. For example, given a weight vector w 2 RP
physical location, the TF-MxNE prior can be modified as:
+, one weight per
?⌦w(Z) =
P
X
i=1
w[i](?spacekZ[i,·]k2+ ?timekZ[i,·]k1) ,
12
Page 13
where Z[i,·] stands for the ith row of Z. If w[i] is small the regularization for
the source at location i will be small and the source i is likely to be selected
to explain the data. Assuming that additional location information of the
sources is available, e.g., from fMRI, information is known about the sources,
such as fMRI localizations, it would be possible to inject this knowledge in
the prior in order to have fMRI informed sparse estimates. Note that sparsity
promoting priors do not lead to source estimates where every dipole in the
source space has a non-zero activation. It means that, although some regions
are promoted by the weights, they may not contain any estimated source. It
indeed may happen that MEG misses sources, for example if they are radially
oriented. In this sense the proposed weighted scheme does not act as a strong
prior on the MEG source localization.
For computational reasons, one can also exploit fast solvers such as dSPM
or sLORETA to derive scalar weights that can help reduce the number of
candidate sources. Typically, one can threshold dSPM/sLORETA estimates
and restrict the TF-MxNE solver to a small portion of the cortex, further
improving the computational e?ciency of the optimization algorithm. It
corresponds to setting w[i] to infinity (or very large) if the ithspatial location
yields very low dSPM values at all points in time.
3.5. Amplitude bias compensation
Methods based on `1priors, such as TF-MxNE, are known to impose an
amplitude bias on the solution. This is due to the general bias-variance trade-
o↵ in statistical estimation. With `1based priors, the high sparsity of the
solution comes at the price of a strong amplitude bias. Given the waveforms
for the selected sources it is possible to post-process them and correct the
amplitude bias leading to meaningful amplitudes of the source activations.
See [18] for an example of amplitude bias correction in the context of fMRI
decoding.
A first natural approach to correct for the amplitude bias is to compute
the least squares solution restricted to the active set of sources provided
by the TF-MxNE solution. It amounts to computing a dipole fit with a
known set of dipoles, which is no longer an ill-posed problem. However,
this procedure a↵ects the source time courses, and the signal smoothness
promoted by the TF-MxNE is lost. Hence, rather than re-estimating the
source time courses using least squares, we correct the amplitude bias by
scaling the TF-MxNE results. For this purpose, we introduce a diagonal
scaling matrix D, whose diagonal elements are scaling factors for all sources in
13
Page 14
the active set. These scaling factors are constrained to be above 1 to actually
remove the bias, and are constant over time. Furthermore, in the case of
free orientation, they are identical for all orientations at a given location in
order to preserve the source characteristics and orientations estimated using
TF-MxNE. The bias corrected source estimate˜X is computed using D as
˜X = DX = DZ?H. We estimated the scaling matrix D based on the
following convex optimization problem:
D?= argmin
D
kM ? GDXk2
Fros.t.
(
Dij? 1,
Dij= 0,
i = j
i 6= j.
The optimization problem can also be solved e?ciently with FISTA after
writing the constraint on D as an indicator function over a convex set C =
{D s.t. Dii? 1, and Dij= 0, if i 6= j}:
(
f2(D) = ◆C(D) =
0
1
if D 2 C
otherwise
.
4. Practical details
This section presents the details in the e?cient implementation of Algo-
rithm 1. We also discuss the choice of the hyperparameters (regularization
parameters).
4.1. Implementation
Algorithm 1 requires to compute Gabor transforms at each iteration which
can be computationally demanding. However, due to the `21sparsity induc-
ing prior, only a few rows of Z have non-zero coe?cients. The Gabor trans-
form is therefore computed for only a limited number of rows, equivalently
a small number of active sources. This makes the computation of Y?H(cf.
Algorithm 1 line 6) much faster.
Also when a tight frame is used, the `21norm of a signal does not change
when ? is applied.This means that the `21 proximity operator can be
applied to temporal data to discard some sources from the active set without
computing the STFT. This comes from the fact that if proxk k21(x) = 0 for
a time series x then proxk k1+k k21(?x) = 0.
Since the proposed optimization problem is convex, the solution does not
depend on initial conditions. Hence, in order to further reduce the compu-
tation time, it is beneficial to initialize the TF-MxNE solver with the `21
14
Page 15
MxNE solution obtained with the same spatial regularization, since MxNE
can be computed e?ciently using active set strategies [15]. Note again that
the `21MxNE solution is used as an initialization and not for restricting the
source space.
4.2. Selection of the regularization parameters
Model selection in the present case amounts to setting the regularization
parameters ?spaceand ?time, as well as the parameter of the Gabor transform,
namely the time resolution with k0and the frequency resolution, function of
the window length T. The parameter k0and T will depend on the length of a
time interval during which signals can be considered stationary. A too high
sampling of the time-frequency plane will also lead to high computational
costs. The regularization parameters have an e↵ect on the spatial sparsity,
the number of active dipoles, and the temporal smoothness of the source
time series. Di↵erent strategies exist to set such model parameters (cross-
validation, discrepancy principle etc.).
In the case of `21priors, one can prove that there exists a value of ?max
for ?space such that if ?space ? ?max
source is active. This provides a convenient way to specify the regularization
parameter as the ratio of ?space and ?max
section if ?spaceis given as a percentage it corresponds to this ratio, rescaled
to percents. For convenience, the parameter ?timecan then be also scaled by
?max
is that they become much less sensitive to the dataset. Assuming ? is a
tight frame then kXk21= kX?k21= kZk21, then one can show based on the
optimality conditions for the `21mixed-norm [15] that:
space
space, then Z⇤is filled with zeros, i.e., no
space, between 0 and 1. In the next
space. The benefit of the reparametrization of the regularization parameter
?max
space= max
i
k(GTM)[i,·]k2
5. Results
In the following, we first evaluate the accuracy of our solver with simula-
tions. We then apply our solver to two MEG/EEG datasets.
5.1. Simulation study
In order to have a reproducible and reasonably fast comparison of di↵erent
priors, we generated a small simulation dataset with 20 EEG electrodes and
200 sources. Four of these sources were randomly selected to be active. The
15
Download full-text