786IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 50, NO. 6, JUNE 2003
MEG Source Localization Using an MLP With a
Distributed Output Representation
Sung Chan Jun*, Barak A. Pearlmutter, and Guido Nolte
Abstract—We present a system that takes realistic magnetoencephalo-
graphic (MEG) signals and localizes a single dipole to reasonable accuracy
in real time. At its heart is a multilayer perceptron (MLP) which takes
the sensor measurements as inputs, uses one hidden layer, and generates
as outputs the amplitudes of receptive fields holding a distributed rep-
resentation of the dipole location. We trained this Soft-MLP on dipolar
sources with real brain noise and converted the network’s output into an
explicit Cartesian coordinate representation of the dipole location using
two different decoding strategies. The proposed Soft-MLPs are much
more accurate than previous networks which output source locations in
Cartesian coordinates. Hybrid Soft-MLP-start-LM systems, in which
the Soft-MLP output initializes Levenberg–Marquardt, retained their
accuracy of 0.28 cm with a decrease in computation time from 36 ms to
30 ms. We apply the Soft-MLP localizer to real MEG data separated by
a blind source separation algorithm, and compare the Soft-MLP dipole
locations to those of a conventional system.
Index Terms—Distributed representation, magnetoencephalography,
multilayer perceptron, source localization.
Source localization using electroencephalography (EEG) and
magnetoencephalography (MEG) identifies brain regions that emit
detectable electromagnetic signals. The multilayer perceptron , a
particular sort of universal approximator, has been recently used to
build fast dipole localizers , [3, and reference therein]. All this
work used multilayer perceptrons (MLPs) whose outputs represented
source location or dipole moment vectors in Cartesian coordinates—a
representation which might be expected to limit their performance and
We propose an MLP with a distributed representation1of the dipole
location. Our Soft-MLP network, which uses that representation, lo-
calizes a dipole to reasonable accuracy in real time from MEG signals
contaminated by considerable noise. Its output consists of the ampli-
tudes of Gaussian receptive fields evenly distributed within a spherical
head model, which taken together represent the dipole location. Like
the Cartesian representation, this does not confine the dipole to a finite
set of grid locations; but unlike the Cartesian representation, it is natu-
rally tolerant to noise far from the region of interest.
supported by the National Science Foundation (NSF) under CAREER award
97-02-311, the MIND Institute, and the NEC Research Institute. Asterisk indi-
cates corresponding author.
*S. C. Jun is with the Biological & Quantum Physics Group, MS-D454,
Los Alamos National Laboratory, Los Alamos, NM 87545 USA (e-mail:
B. A. Pearlmutter is with the Hamilton Institute, NUI Maynooth, Maynooth,
Co. Kildare, Ireland (e-mail: email@example.com).
G. Nolte is with the Human Motor Control Section, Medical Neu-
rology Branch, National Institute of Neurological Disorders and Stroke,
National Institutes of Health, Bethesda, MD 20892-1428 USA (e-mail:
Digital Object Identifier 10.1109/TBME.2003.812154
1The term distributed representation is standard in neural networks .
The synthetic data used in our experiments consisted of corre-
sponding pairs of dipole locations and sensor activations, as generated
by a forward model. Given a dipole location and a set of sensor
activations, the minimum error dipole moment can be calculated
analytically . Therefore, although the dipoles used in generating the
data set have both location and moment, we discarded the moment in
all the experiments below.2
We made two datasets, one for training and the other for testing.
Dipoles were drawn uniformly from truncated spherical regions [3,
Fig. 1]. Their moments were drawn uniformly from vectors of strength
? ??? nA ? m. The corresponding sensor activations were calculated
by adding the results of a forward model and a noise model. To make
sure the network does not inappropriately project external sources into
the brain,3and allow the network to interpolate rather than extrapo-
late, thus improving performance, the training set used dipoles from
the larger region, while to better approximate field conditions the test
set contained only dipoles from the smaller inner region. We used the
sensor geometry of a 4-D Neuroimaging Neuromag-122 whole-head
MEG system ,4and an analytic forward model of quasistatic elec-
tromagnetic propagation in a spherical head [3, Section 2.1].
In order to properly compare the performance of various localizers,
we need a dataset for which we know the ground truth, but which con-
tains the sorts of noise encountered in actual MEG recordings. To this
end, we collected real brain noise from unaveraged MEG recordings
(task involving abrupt visual stimulation and subsequent brief motor
output and audio feedback, two right-handed myopic middle-aged fe-
male subjects, analog bandpass filter 0.03–100 Hz) during periods far
(RMS) sensor reading of ??? ?? ? ??? fT?cm. We measured the
signal-to-noise ratio (SNR) of a dataset using the ratios of the powers
in the signal and noise, SNR (in decibels) ? ????????????, where
??is the RMS sensor reading from the dipole.
B. Soft-MLP Structure
The Soft-MLP charged with approximating the inverse mapping had
an input layer of 122 units, one for each sensor; one hidden layer with
? units; and an output layer of ? ? ??? units representing the am-
plitudes of three-dimensional Gaussian receptive fields in the training
region of the head model. The target output representation of a dipole
at location ? was the ?-dimensional (?-D) vector ???? defined by
????? ? ??????? ? ?????????where ?? is the center of Gaussian
receptive field ? and ? is the length scale of the receptive fields. The re-
ceptive field centers ??were evenly distributed with a spacing of 3 cm,
and we set ? ? ??? cm. These parameters were determined empiri-
cally.Withthese,? ? ???receptivefieldsservedtocoverthetraining
2We experimented with a dataset containing as targets both the location and
moment of each dipole, and despite the increased generalization expected for
multitask training , we found no decrease in localization error. Typically, an
accurate estimate of the location is much more important than of the moment
direction or strength. We also found that networks trained without a moment
target to be more robust.
3This can easily occur in practice, for instance when the head position is in-
correctly measured. This is a condition we would like our system to note, rather
than silently projecting external dipoles into the region it believes to be occu-
pied by the brain.
4This MEG system has 61 pairs of first-order planar gradiometers.
0018-9294/03$17.00 © 2003 IEEE
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 50, NO. 6, JUNE 2003787
DISTRIBUTION OF SNR FOR THE 4500 TESTING PATTERNS
The output units had linear activation functions,5while to accel-
erate training the hidden units had hyperbolic tangent activation func-
tions . Adjacent layers were fully connected, and there were no
cut-through connections. Input data were usually preprocessed to im-
prove the performance , and here the 122 MEG sensor activation in-
initialized with uniformly distributed random values between ?0.1.
Backpropagation was used to calculate the gradient of the sum squared
error , which in turn was used for online stochastic gradient decent
optimization with an empirically chosen descent rate ?, as in ?? ?
In determining a reasonable MLP structure, practical considerations
constrained ourexperimentstonetworkswithnomorethan160 hidden
units. We ran experiments with 20, 40, 60, 80, 120, and 160 units in the
hidden layer. Each MLP was trained with noise-free training datasets
the bestgeneralization error6in500 epochs oftraining7wasmeasured,
using a noise-free test set of 5000 patterns. For each MLP size and
training dataset five runs were performed, and the generalization errors
The computation time for localization increases linearly with the
number of hidden units, and the training time increases about linearly
with the size of the training dataset and the size of the hidden layer.
When the training dataset is small, generalization error is high. In-
creasing the computation, i.e., increasing the size of the training set
or the number of units in a hidden layer, tends to reduce the generaliza-
units. For this reason we chose to use 80 hidden units.
C. Decoding Strategies
For practical use, and to measure performance, the ?-D distributed
vector, ? ? ??????????????, under the assumption that ?? ? ?????.
• Linearly interpolate between the centers of the receptive fields
in a ball ?? with center ?? and radius 6 cm (twice the inter-
5In artificial neural networks the activation function computes the output
value of an artificial neuron based on the weighted sum of its inputs. The output
value may be continuous or discrete, and Heavyside, linear, ?????????????,
and hyperbolic tangent activation functions are widely used.
6Because our training sets were large, the performance of the network on the
training dataset and on a new testing dataset would be about the same, were
the testing dataset not taken from a smaller region. The test dataset is used to
obtained on a test dataset.
7In one epoch each exemplar in the training dataset is presented once.
Soft-MLP using two sorts of decoding strategies, and their hybrid methods.
Decoding strategy 1 and strategy 2 are denoted by S1 and S2, respectively.
Mean localization error versus SNR for the Cartesian-MLP, the
PERFORMANCE OF CARTESIAN-MLPs, SOFT-MLPs, AND MLP-START-LM
HYBRIDS. EACH NUMBER IS AN AVERAGE OVER 4500 LOCALIZATIONS.
SOFT-MLPs WERE TESTED USING TWO DECODING STRATEGIES (S1/S2)
center distance)8using the activation values as weights, ? ? ?
• For each of the ? receptive field centers place a ball ?? with
center ?? and radius 6 cm (twice the inter-center distance), and
calculate ?? ? ???
• Find ??? ?????????.
• Apply the linear interpolation of Strategy 1 step 2.
?????? ????????? ???.
III. LOCALIZATION RESULTS
A. Comparison of Soft-MLP and Cartesian-MLP
The training dataset contained 20000 exemplars, contaminated with
real brain noise, and another dataset, of4500 MEG signal patterns con-
taminated by real brain noise, was constructed for testing.9The distri-
bution of SNRs for patterns in the testing dataset are shown in Table I.
AMD Athlon for each training dataset.
After each Soft-MLP was trained, its performance in RMS linear
accuracy was measured by converting the outputs to Cartesian
coordinates. Using each of the two decoding strategies led to two
different systems. The Soft-MLP performance, along with that of a
Cartesian-MLP network,10is shown as a function of input SNR in
8When a bigger radius is used, outliers are not filtered out and computation is
more costly. When a smaller radius is used, significant values might be thrown
away. Empirically, twice the inter-center distance balanced these consideration.
9The number of exemplars was constrained in part by the availability of real
brain noise data in which we were confident.
10The structure of Cartesian-MLP was empirically optimized by trading off
computation and accuracy .
788 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 50, NO. 6, JUNE 2003
Soft-MLP (S2) for eight actual BSS-separated visual MEG components. (left) Axial view. (middle) Coronal view. (right) Saggital view. The outer surface denotes
the sensor surface, and diamonds on the surface denote sensors. The inner surface denotes the spherical head model.
Dipole source localization results, using real data with sources separated using SOBI. Locations found by standard Neuromag xfit software versus the
Fig. 1. As a whole, the Soft-MLP is more accurate than the Carte-
sian-MLP. The Soft-MLP using Strategy 2 shows a slight performance
advantage over Strategy 1. Each of the Soft-MLP localizers was
used to initialize a Levenberg–Marquardt (LM) optimizer, giving two
variant Soft-MLP-start-LM hybrids. Their performance is shown as
a function of SNR and compared with the hybrid method using the
Cartesian-MLP in Fig. 1. The MLP-start-LMs of the Soft-MLPs show
better localization accuracy at high SNRs than the MLP-start-LM of
the Cartesian-MLP, while they have degraded accuracy at low SNRs.
These results also held for networks trained with other sorts of noise
A grand summary, averaged across various SNR conditions, is
shown in Table II. In comparing the Soft-MLP (S2) and Carte-
sian-MLP localizers, both trained with real brain noise, one sees that
localization error improved from 1.15 to 0.85 cm, while computation
time increased from 0.3 to 1.0 ms. With an increased expense in time,
the distributed output representation yielded much more accurate
(assuming a spherical uncertainty, the zone in which the dipole is
likely located is decreased from 1.5 to 0.6 cm?, a factor of 2.5)
localizations. The MLP-start-LM method using Soft-MLP (S2) has
the same localization error as Cartesian-MLP, 0.28 cm. However, it
is slightly faster! This surprising reduction in total computation time
is due to the Soft-MLP generating a better initial guess, resulting in
fewer iterations of LM.
B. Localization for Actual BSS-Separate MEG Components
We applied the Soft-MLP (S2) trained with real brain noise to lo-
calize dipolar sources from actual BSS-separated MEG signal compo-
nents. The xfit program (standard commercial software bundled with
the 4-D Neuroimaging Neuromag-122 MEG system) is compared with
the methods developed here. We chose eight of the actual BSS-sepa-
source well, and which met other criteria for correct localization laid
out in . (Continuous MEG data were collected, sampled at 300 Hz,
band-pass filtered at 0.03–100 Hz, separated using SOBI, and scanned
for neuronal sources of interest. See ,  for full details.)
Fig. 2 shows the localized dipoles from three viewpoints: axial (?–?
plane), coronal (?–? plane), and saggital (?–? plane). The MLP-es-
timated locations are about 1.18 cm on average from those of xfit.
The trained Soft-MLP is applicable to actual MEG signals, and can
be a good initial guessor for iterative methods with clear advantages in
speed and in the lack of required human interaction.
C. Comparison With the Global Search Algorithm
The global search algorithm uses storage to reduce computation in
dipole localization . Briefly: a number of grid points are selected
in the head model, and the field pattern at the sensors resulting from
orthogonally oriented dipoles at each location are precomputed. This
information allows the orientation and strength of a dipole located at
a particular grid point which best fits a vector of measurements to be
efficiently calculated. When a measured signal is to be localized, the
and the location of the grid point with the best GOF is used to initialize
a gradient-based optimization routine.
This table-based algorithm is surprisingly efficient at localizing
dipolar sources. For example, it has been used to localize a dipole at
each time point in a large (100000 sample) MEG dataset . The
primary weakness of the global search algorithm is that the gridding
must be fine enough for the problem at hand. In particular, the spacing
of precomputed points must be well above the Nyquist limit of the
highest spatial frequency in the error surface, or the correct optimum
can be skipped over. Therefore, table size, and, therefore, the time
required for a localization, will increase with increasing complexity of
the error surface. The error surface might become more complex under
two circumstances: 1) with a more realistic head model ; 2) with
a complex MEG helmet, for instance a helmet with superconducting
magnetic reflectors .
In contrast, the Soft-MLP is robust to a complex error surface.
Even with a realistic head model or a highly complex MEG helmet,
Soft-MLP is applicable without modification. Under such circum-
stances, we expect the Soft-MLP to do a single localization much
more quickly than the global search algorithm, even though it may
take longer to train the MLP than to precompute the global search
Another advantage of the Soft-MLP is that it can be trained using an
arbitrary noise model, characterized only by a set of samples, such as
on a least-squares fit to determine the GOF at each grid point, its noise
model must be Gaussian.
We propose the use of distributed representations to encode dipole
locations in the output of MLP-based dipole localizers. Experiments
showedthat suchanetworkwasfastandrobust,and wasabetterdipole
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 50, NO. 6, JUNE 2003789 Download full-text
source localizer (0.85 cm versus 1.15 cm) at slightly greater computa-
tional expense (1.0 ms versus 0.3 ms) than a comparably tuned system
using a Cartesian representation. The hybrid MLP-start-LM method
using the new MLP showed the same accuracy as previous systems
(0.28 cm) but computation time was reduced from 36 ms to 30 ms.
Furthermore, the Soft-MLP was successfully applied to actual MEG
A Cartesian output representation cannot encode the location of
more than a single dipole. Our use of a distributed output representa-
tion was in part motivated by the hope that its greater representational
capabilities might allow Soft-MLP networks to be used for multiple
dipole localization. The improvements in accuracy for a single dipole
were an unexpected benefit, but we will continue our efforts to apply
the Soft-MLP architecture to the multiple dipole case.
tions by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
 U. R. Abeyratne, Y. Kinouchi, H. Oki, J. Okada, F. Shichijo, and K.
Matsumoto, “Artificial neural networks for source localization in the
human brain,” Brain Topogr., vol. 4, pp. 3–21, 1991.
 S. C. Jun, B. A. Pearlmutter, and G. Nolte, “Fast accurate MEG source
localization using a multilayer perceptron trained with real brain noise,”
Phys. Med. Biol., vol. 47, no. 14, pp. 2547–2560, 2002.
 G. E. Hinton, J. L. McClelland, and D. E. Rumelhart, “Distributed rep-
resentations,” in Parallel Distributed Processing: Explorations In The
Microstructure of Cognition, Volume 1: Foundations, D. E. Rumelhart
and J. L. McClelland, Eds.Cambridge, MA: MIT Press, 1986.
 M. Hämäläinen, R. Hari, R. J. Ilmoniemi, J. Knuutila, and O. V.
Lounasmaa, “Magnetoencephalography—theory, instrumentation, and
applications to noninvasive studies of the working human brain,” Rev.
Modern Phys., vol. 65, pp. 413–497, 1993.
 R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp.
 A. I. Ahonen, M. S. Hämäläinen, J. E. T. Knuutila, M. J. Kajola, P. P.
“122-channel SQUID instrument for investigating the magnetic signals
from the human brain,” Physica Scripta, vol. T49, pp. 198–205, 1993.
 Y. LeCun, I. Kanter, and S. A. Solla, “Second order properties of error
surfaces: learning time and generalization,” in Advances in Neural In-
formation Processing Systems 3.
1991, pp. 918–924.
 A. C. Tang, B. A. Pearlmutter, N. A. Malaszenko, D. B. Phung, and B.
C. Reeb, “Independent components of magnetoencephalography: local-
ization,” Neural Computation, vol. 14, no. 8, pp. 1827–1858, 2002.
 A. C. Tang, B. A. Pearlmutter, M. Zibulevsky, T. A. Hely, and M.
P. Weisend, “An MEG study of response latency and variability in
the human visual system during a visual-motor integration task,” in
Advances in Neural Information Processing Systems 12.
MA: MIT Press, 2000, pp. 185–191.
 J. C. de Munck, A. de Jongh, and B. W. van Dijk, “The localization of
spontaneous brain activity: An efficient way to analyze large data sets,”
IEEE Trans. Biomed. Eng., vol. 48, pp. 1221–1228, 2001.
 R. Van Uitert, D. Weinstein, C. Johnson, and L. Zhukov, “Finite element
approximations,” J. Biomedizinische Technik, vol. 46, pp. 32–34, 2001.
 R. H. Kraus Jr., P. L. Volegov, K. Maharajh, M. A. Espy, A. N.
Matlashov, and E. R. Flynn, “Performance of a novel SQUID-based
superconducting imaging-surface magnetoencephalography system,”
Physica C, vol. 368, no. 1–4, pp. 18–23, 2002.
San Mateo, CA: Morgan Kaufmann,
Independence of Myoelectric Control Signals
Examined Using a Surface EMG Model
Madeleine M. Lowery*, Nikolay S. Stoykov, and Todd A. Kuiken
Abstract—The detection volume of the surface electromyographic
(EMG) signal was explored using a finite-element model, to examine
the feasibility of obtaining independent myoelectric control signals from
regions of reinnervated muscle. The selectivity of the surface EMG signal
was observed to decrease with increasing subcutaneous fat thickness.
The results confirm that reducing the interelectrode distance or using
double-differential electrodes can increase surface EMG selectivity in
an inhomogeneous volume conductor. More focal control signals can be
obtained, at the expense of increased variability, by using the mean square
value, rather than the root mean square or average rectified value.
Index Terms—Detection volume, finite-element model, myoelectric con-
trol, surface EMG.
One of the greatest limiting factors in the development of myoelec-
tric prostheses has been the inadequacy of current control strategies.
In response to this problem, many advances have been made in devel-
oping complex signal processing algorithms to increase the amount of
information that can be extracted from each channel of electromyo-
graphic (EMG) activity –. An alternative approach is to increase
the number of independent EMG signals available to the controller.
Preliminary studies on the use of nerve-muscle grafts as a possible
method of achieving this are currently being conducted . For this
technique to work it is important that independent control signals can
be obtained from each nerve-muscle graft and that crosstalk, the detec-
tion of volume conducted signals from muscles other than the muscle
of interest, be kept to a minimum. The relative contributions of motor
units (MUs) located throughout the muscle tissue to the surface EMG
interference pattern, however, are not yet fully known. This issue is
central in determining the feasibility of the proposed technique to suc-
cessfully control multifunctional prostheses and is directly relevant to
many other surface EMG applications.
One method of investigating the pick-up range of the surface EMG
signal is to use model simulation. Anatomical properties and electrode
configuration are both known to affect EMG crosstalk at the skin sur-
face. The effect of interelectrode distance and increased selectivity of
the surface EMG signal with double-differential or higher order spa-
tial filters have been widely studied both experimentally and in model
Manuscript received July 15, 2002; revised December 15, 2002. This work
was supported in part by the Whitaker Foundation under a Biomedical Engi-
neering Research Grant, in part by the National Institute of Child and Human
Developmentunder Grant 1K08HD01224-01A1,and in partby the NationalIn-
Asterisk indicates corresponding author.
*M. M. Lowery is with the Research Department, Rehabilitation Institute of
Chicago, Chicago, IL 60611-4496 USA and also with the Department of Phys-
ical Medicine and Rehabilitation, Northwestern University, Evanston, IL 60201
USA (e-mail: firstname.lastname@example.org).
N. S. Stoykov is with the Research Department, Rehabilitation Institute of
Chicago, Chicago, IL 60611-4496 USA and also with the Department of Phys-
ical Medicine and Rehabilitation, Northwestern University, Evanston, IL 60201
T. A. Kuiken is with the Rehabilitation Institute of Chicago, Chicago, IL
60611-4496 USA and also with the Departments of Physical Medicine and Re-
Evanston, IL 60201 USA.
Digital Object Identifier 10.1109/TBME.2003.812152
0018-9294/03$17.00 © 2003 IEEE