Improved Optimization for the Robust and Accurate Linear Registration
and Motion Correction of Brain Images
Mark J enkinson,* Peter Bannister,*,† Michael Brady,† and Stephen Smith*
*Oxford Centre for Functional Magnetic Resonance Imaging of the Brain, J ohn Radcliffe Hospital, Headington, Oxford OX3 9DU; and
†Medical Vision Laboratory, Department of Engineering Science, University of Oxford, Parks Road, Oxford OXI 3PJ , United Kingdom
Received September 19, 2001
L inear registration and motion correction are im-
portant components of structural and functional brain
image analysis. Most modern methods optimize some
intensity-based cost function to determine the best
registration. T o date, little attention has been focused
on the optimization method itself, even though the
success of most registration methods hinges on the
quality of this optimization. T his paper examines the
optimization process in detail and demonstrates that
the commonly used multiresolution local optimization
methods can, and do, get trapped in local minima. T o
address this problem, two approaches are taken: (1) to
apodize the cost function and (2) to employ a novel
hybrid global–local optimization method. T his new op-
timization method is specifically designed for register-
ing whole brain images. It substantially reduces the
likelihood of producing misregistrations due to being
trapped by local minima. T he increased robustness of
the method, compared to other commonly used meth-
ods, is demonstrated by a consistency test. In addition,
the accuracy of the registration is demonstrated by a
series of experiments with motion correction. T hese
motion correction experiments also investigate how
the results are affected by different cost functions and
Key Words: accuracy; affine transformation; global
optimization; motion correction; multimodal registra-
tion; multiresolution search; robustness.
© 2002 E lsevier Science (USA)
Geometric registration and motion correction areim-
portant stages in the analysis of functional brain im-
aging studies. Consequently, it is important that these
stages perform robustly and accurately. Furthermore,
for large imaging studies it is desirable that they be
There has been a considerable amount of research
intoregistration and motion correction of brain images,
and many different methods have been proposed
(Maintz and Viergever, 1998). Most methods in com-
mon usage are based on the mathematical framework
of optimizing an intensity-based cost function. How-
ever, although much work has concentrated on how the
choiceof cost function affects registration performance,
there has been far less examination of the effect of the
optimization method. Moreover, when optimization
methods are discussed, global methods are often ig-
nored and local methods compared purely on the basis
of speed (Maes et al., 1999).
One of the most common and serious problems for
registration methods is the presence of local minima in
the cost function. These cause local optimization meth-
ods to “get stuck” and hence to fail to find the desired
the global minimum. Most registration methods at-
tempt to solve this problem by incorporating a local
optimization strategy within a multiresolution frame-
work. Such a multiresolution framework, which typi-
cally involves starting with low-resolution images (con-
taining only gross features) and working progressively
through to higher resolutions, aims to avoid the local
minima “traps.” As we show later, this simple mul-
tiresolution approach is not always sufficient for avoid-
ing local minima, and by using more sophisticated op-
timization methods, the chances of becoming “trapped”
in these local minima can be substantially reduced.
Two types of local minima commonly occur for the
cost functions used in image registration: large-scale
basins and small-scale dips. The first type, the large-
scale basin, is responsible for large misregistrations
since the local minimum is often far from the global
minimum. The second type, small-scale dips, can cause
the optimization to get stuck at any stage and so are
responsible for large misregistrations at low resolu-
tions and small misregistrations at high resolutions.
We propose two methods for dealing with the local
minima problem. These are cost function apodization,
which reduces or eliminates small-scale dips, and a
hybrid global–local optimization technique which uti-
lizes prior knowledge about brain registration tocreate
an optimization technique that combines the speed of
local optimization with the robustness of global opti-
NeuroImage 17, 825–841 (2002)
© 2002 Elsevier Science (USA)
All rights reserved.
The following sections of this paper are background
theory, methods (including both cost function apodiza-
tion and the hybrid optimization method), results, and
discussion. The results section contains a number of
experiments on real, whole brain images which dem-
onstrate the effectiveness of the registration in two
different settings: (1) structural image registration (in-
termodal/intersubject) of an anatomical image to a
standard template; and (2) functional image motion
correction (intramodal/intrasubject) which registers
each image in a time-series to a particular example
image from that time-series. The first case is examined
using a robustness study (as accuracy is hard to define
for intersubject registration, and robustness is a more
important issue in this context), while the second case
is examined using an accuracy study (as, in this con-
text, it is accuracy that is more important). In each
case real brain image data are used. Comparisons with
some commonly used methods are also included (in
both cases) which demonstrate the superior robustness
and accuracy which can be obtained using this ap-
The registration problem studied here is to find the
best geometric alignment of two(volumetric brain) im-
ages. Call the twoimages the reference (Y) and floating
(X) images. More precisely, the registration problem
seeks that transformation which, when applied to the
floating image, maximizes the “similarity” between
this transformed floating image and the reference im-
A standard, and common, way of formulating this as
a mathematical problem is to construct a cost function
which quantifies the dissimilarity between two images
and then search for the transformation (T*) which
gives the minimum cost. In mathematical notation this
T * ? arg min
where ST is the space of allowable transformations,
C(I1, I2) is the cost function, and T(X) represents the
image X after it has been transformed by the transfor-
mation T. In this paper we shall only consider linear
registration so that ST is either the set of all affine
transformations or some subset of this (such as the set
of all rigid-body transformations).
Many different cost functions have been proposed for
image registration problems. Some use geometrically
defined features, found within the image, to quantify
the (dis)similarity, while others work directly with the
intensity values in the images. A large comparative
study of different registration methods (West et al.,
1997) indicated that intensity-based cost functions are
more accurate and reliable than the geometrically
based ones. Consequently, most recent registration
methods have used intensity-based cost functions, and
these are the ones which will be discussed in this
Intensity-based cost functions can be divided natu-
rally into two categories: those suitable for intramodal
problems and those suitable for intermodal problems.
In the former category the most commonly used cost
functions are least squares (LS) and normalized corre-
lation (NC). For the latter, and more difficult, category
the most commonly used functions are mutual infor-
mation (MI), normalized mutual information (NMI),
woods (W), and correlation ratio (CR). These functions
are defined mathematically in Table 1 (see (J enkinson
and Smith, 2001) for more information).
Interpolation.In addition to a pair of images and a
T ABL E 1
Mathematical Definitions of the Most Commonly Used In-
tensity-Based Cost Functions: least squares (LS); normalized
correlation (NC); Woods (W); correlation ratio (CR); mutual
information (MI); and normalized mutual information (NMI)
function DefinitionMinimum Maximum
¥(Y ? X)2
¥?X ?Y ?
H(X, Y) ? H(X) ? H(Y)
H(X ,Y )
Note. The notation is as follows: quantities X and Y denote images,
each represented as a set of intensities; ?(A) is the mean of set A;
Var(A) is the variance of the set A; Ykis the kth iso-set defined as the
set of intensities in image Y at positions where the intensity in X is
in the kth intensity bin; nkis the number of elements in the set Yk
such that N ? ¥knk; H(X,Y) ? ?¥ijpijlog pijis the standard entropy
definition where pijrepresents the probability estimated using the (i,
j) joint histogram bin, and similarly for the marginals, H(X) and
H(Y). Note that the sums in the first two rows are taken over all
J ENKINSON ET AL.
particular transformation, the cost function requires
that a method of interpolation be defined, that is, some
method of calculating what the intensity is in the float-
ing image at points in between the original voxel (or
grid) locations. This is necessary in order to know the
intensity at corresponding points in the images after
the geometrical transformation has been applied tothe
Interpolation methods that are commonly used are
trilinear (also called linear or, in 2D, bilinear), nearest
neighbor, sinc (of various kernel sizes and with or
without various windowing functions; e.g., Blackman),
spline, and Fourier. The choice of method has some
impact on cost function smoothness, although all inter-
polation methods except nearest neighbor are continu-
ous. However, the choice of method becomes most crit-
ical for motion correction as the transformed image
intensities are needed for later statistical analysis.
Once a cost function has been chosen it is necessary
to search for the transformation which will yield the
minimum cost value. To do this, an optimization
method is used which searches through the parameter
space of allowable transformations. Note that rigid-
body transformations are specified by 6 parameters (3
rotations and 3 translations) while affine transforma-
tions are specified by 12 parameters. Consequently,
even for linear transformation, the optimization takes
place in a high dimensional space; R
While the problem specified in Eq. (1) is a global
optimization, quite often local optimization methods
areemployed as they aresimpler and faster.1However,
this can result in the method returning a transforma-
tion that corresponds to a local minimum of the cost
function, rather than the desired global minimum.
Such cases often appear as misregistrations, of varying
severity, and are a major cause of registration failure.
Unfortunately, there are very few global optimiza-
tion methods that are suitable for a 3D brain image
registration problem. This is because, in terms of op-
erations, the cost function is expensive toevaluate and
most global optimization methods require a great
many evaluations leading to unacceptable execution
times (e.g., days).
optimization process and avoid local minima, most cur-
rently used registration methods employ some form of
multiresolution optimization. That is, a sequence of
n, where 6 ? n ?
To both speed up the
image pairs, at progressively larger spatial scales, is
created from the initial pair of images: (Ir, If). The
images at larger scales are subsampled versions (often
with preblurring) of the original high-resolution im-
ages and so contain fewer voxels which means that
evaluating the cost function requires less computation.
In addition, as only gross features of theimages remain
at theselargescales, it is hoped that therewill befewer
local minima for the optimization to get stuck in.
In functional brain imaging a series of brain images
is acquired. The time elapsed between each acquisition
is usually a few seconds or less. Due to the small
acquisition times required, these images usually have
poor resolution. Furthermore, as the imaging parame-
ters are tuned to highlight physiological changes (e.g.,
blood oxygenation), the images often have poor ana-
Extracting functional information from such a series
of images is done by applying statistical time-series
analysis, which assumes that the location of a given
voxel within the brain does not changeover time. How-
ever, there is usually some degree of subject motion
within thescanner, especially when thescanning takes
a long time or when clinical patients are involved.
Therefore, in order to render the data fit for statistical
analysis this motion must be estimated and corrected
for. This is thetask of motion correction methods and it
is essentially a multiple-image registration task.
Normally, motion correction methods deal with the
registration task by selecting a reference image from
within the series and registering each image in turn to
this fixed reference. As all images are of the same
subject, using the same imaging parameters, it can be
classified as an intrasubject, intramodal registration
problem. Therefore, a rigid-body transformation space
and intramodal cost function can be used. Further-
more, as the values in the corrected images are impor-
tant for later statistical analysis, the choice of interpo-
lation method for thetransformation of theimages is of
particular importance (Hajnal et al., 1995a,b).
Apodization of the Cost Functions
As seen in Fig. 1, the local behavior of the cost
function shows small discontinuities as the transfor-
mation parameters are varied smoothly. This creates
local minima traps for the optimization method. Since
all interpolation methods are continuous (except near-
est neighbor, which is consequently seldom used) the
discontinuities are not due to the type of interpolation
used. Thecauseof thesediscontinuities is thechanging
amount of overlap of the reference and floating image.
1A global method searches the entire range of possible param-
eters for the most optimal cost function value, while a local
method simply starts somewhere in parameter space and moves
about locally trying to find an optimum, stopping when there is no
better nearby set of parameters.
ROBUST AND ACCURATE REGISTRATION
There are two different ways of treating values out-
side the field of view (FOV) of an image:
1. Treat all values outside the FOV as zero.
2. Do all calculations strictly in the overlapping re-
The first method is undesirable as it creates artificial
intensity boundaries when the object is not wholly
contained within the FOV. However, in the second
method the number of points counted in the overlap-
ping region varies, not just the expressions involving
intensities. Therefore, in the second case both the nu-
merator and the denominator of the cost functions
(except least squares) will change discontinuously as
the amount of overlap changes.
The discontinuities exist because the images are dis-
crete sets of voxels. In particular, the reference image
defines a fixed set of voxel locations over which the cost
function is calculated. Then, for a given transforma-
tion, the floating image intensities at these locations
are calculated using interpolation. A reference image
voxel location is only counted when it is valid, that is,
within the overlapping region such that it maps to a
location insidetheFOV of thefloating image. When the
edge of the FOV of the floating image crosses a refer-
ence voxel location, the location suddenly goes from
being inside the overlapping region to outside, causing
a discontinuous change in the number of valid loca-
tions, as shown in Fig. 2.
We aim to apodize the cost function by removing
these discontinuities. To do this, our approach has
been to introduce a geometric apodization that de-
weights the contributions of locations that are near the
edgeof theoverlapping region. Theweighting is chosen
so that the contribution of such locations drops contin-
uously until it reaches zero at the edge of the overlap-
ping region. Any continuous weighting function could
be used but for simplicity and computational efficiency
we choose one that is linear.
For instance, consider a 2D example of a reference
location that maps to a point inside the overlapping
region, where the distance from the nearest edges of
the floating image FOV is dXand dYunits, as shown in
Fig. 2. In each dimension, if this value is less than
some threshold D, then the influence of that point is
weighted by a weight w ? d/D. In higher dimensions,
the product of the weighting functions in each dimen-
sion is used. That is, w(dX, dY, dZ) ? w(dX)w(dY)w(dZ).
F IG. 1.Illustration of local discontinuities in cost function plots.
F IG. 2.FOV change.
J ENKINSON ET AL.
The weighting is applied to all terms involving that
location’s intensity as well as to the number of loca-
tions in the region. For example, consider the nth mo-
ment of an iso-set:
1 , (3)
where Mnis the nth moment, j is a voxel index, X and
Y represent the reference and floating images, respec-
tively, and Ikdenotes the kth intensity bin. With gen-
eral weighting this becomes
where wj is the weight of the location j, which is 0
outside the overlapping region, d/D for d ? D or 1 for
d ? D inside the overlapping region.
This weighting scheme can be simply and efficiently
applied to any of the non-entropy-based cost functions
(i.e., LS, NC, W and CR). It depends on one parame-
ter—the threshold distance D— which can be varied to
increase the amount of apodization. When D ? 0 there
is no apodization, while increasing D creates smoother
and smoother cost functions, although thecost function
will be continuous for any nonzerovalue of D. Alsonote
that making D larger than the voxel spacing is permit-
ted and just has a greater smoothing effect, as shown
in Fig. 3.
J ointhistogram apodization.
weighting scheme is required for apodizing the joint
histogram required for the entropy-based cost func-
tions. This is because the number of entries in each
histogram bin becomes discontinuous as the intensity
at a floating image location (calculated using interpo-
lation) passes through the threshold value between
As it is the intensity passing through a threshold
value that causes the discontinuities for the joint his-
togram, we propose a weighting function that is deter-
mined by the intensities and applied to every location.
We choose, once again, a linear weighting function (as
shown in Fig. 4) where wkis the weight for bin k, I is
F IG. 3. Illustration of cost function apodization.
F IG. 4. Weighting function.
ROBUST AND ACCURATE REGISTRATION
the intensity at the location under consideration, Tkis
the intensity threshold between bins k-1 and k, and ?
is the smoothing threshold—the equivalent of D in the
preceding section. This weight is then applied to the
accumulation of intensity within the joint histogram
bin, as well as the number of entries in the bin, which
is no longer an integer. That is, for a point of intensity
I, the updating equations for bin k are
Nk3 Nk? wk?I?
Sk3 Sk? I ? wk?I? ,
where wk( ? ) is the weighting function for the kth bin
(see Fig. 4), Nkis the occupancy of bin k (a noninteger
version of the number of elements), and Skis the sum
of intensities in the bin.
This approach is effectively fuzzy-binning, where
each intensity bin no longer has sharp thresholds, but
fuzzy membership functions. It alsomeans that a given
location can influence more than one bin entry. Be-
cause of the way the weighting function is calculated
though, each location contributes equally, since the
sum of weights for all bin entries is equal to 1.
As changing overlap will still create discontinuities,
both the geometrical weighting and the fuzzy-binning
must be applied to have a continuous joint histogram.
Moreover, the parameter ? will give a continuous joint
histogram for any value greater than zero, although
the value should not exceed the intensity bin width.
The smoothing capacity of ? is shown in Fig. 5 for the
mutual information cost function. Results for the nor-
malized mutual information cost function arevery sim-
ilar. Note that, in general, a value of ? ? 0.5 together
with D equal to the resolution scale (e.g., 8 mm) gives
a desirably smooth cost function.
Note that the partial volume interpolation intro-
duced by Maes (Maes et al., 1997) also creates contin-
uous joint histogram estimates if used in conjunction
with the geometrical weighting (applied to the refer-
ence locations instead). However, as the name sug-
gests, PVI is morethan just an apodization scheme—in
fact, it functions as an interpolation method too. There-
fore, different interpolation methods cannot be used in
conjunction with PVI, whereas for fuzzy-binning the
interpolation method used can be freely chosen. Fur-
thermore, the fuzzy-binning scheme provides an ad-
justable parameter, ?, which controls the amount of
smoothing of the cost function, allowing for different
degrees of smoothing as desired.
Finally, it can be seen that fuzzy-binning can be
made fully symmetric with respect to the two images
(see (Cachier and Rey, 2000) for a discussion of sym-
metry in general registration cost functions). That is,
both floating and reference intensities can have fuzzy-
binning applied tothem. However, there is an inherent
asymmetry in the way that interpolation is applied
only to the floating image. Therefore, although such a
symmetric approach appears initially attractive, the
F IG. 5. Illustration of cost function apodization.
J ENKINSON ET AL.
simpler and faster approach of only using fuzzy bins for
the floating image was adopted in practice.
A Global–Local Hybrid Optimization Method
Of the many different approaches to global optimi-
zation, we have investigated two strategies and com-
bined them with a simple but fast local optimization
method to produce a hybrid optimization method. The
two strategies are searching and multistart optimiza-
Our hybrid optimization method (also described in
(J enkinson and Smith, 2001)) is specifically designed
for the problem at hand, using prior knowledge about
the transformation parameters and typical data size
(FOV, voxel size, etc.) to help make the method effi-
cient. The method cannot guarantee that the global
solution is found, but then neither can any other global
optimization method given a finite amount of time.
Generally, only statistical “guarantees” are given, and
these often require excessive run-times in order to be
met. In contrast, our method is designed to give a
reliable estimate of the global minimum given some
time restriction (in our case, less than 1 h on a moder-
ately powered standard workstation; e.g., registering
two 1 ? 1 ? 1 mm images typically takes 15 min on a
500-MHz Pentium III).
The method still uses a local optimization method
with a multiresolution framework, and these are de-
scribed in the next two sections, followed by descrip-
tions of the global search and multistart optimization
Multiresolution. Currently, four different scales
are used in our method: 8, 4, 2, and 1 mm. At each
scale, the two images are resampled, after initial pre-
blurring (using a Gaussian with FWHM equal to the
ratio of the final and initial voxel sizes), so that they
have isotropic voxels of size equal to the scale size.
Note that an exception to this occurs if the scale is
smaller than the data resolution, in which case the
data areresampled toisotropic voxels of scaleclosest to
the data resolution.
Furthermore, skew and anisotropic scaling changes
are much less prominent than rotational, transla-
tional, and global scaling changes and so their effects
are difficult to estimate reliably at low resolutions.
Consequently, only similarity transformations (rigid-
body ? global scaling) are estimated at the 8- and
Local optimization. The choice of local optimization
method used here is not critical, except that it must be
efficient. Furthermore, since it will be used in a mul-
tiresolution framework, the low-resolution stages do
not need to find highly accurate transformations.
Therefore, the initial parameter bracketing and the
parameter tolerances (the size of uncertainty on the
optimized parameter values) are both made propor-
tional to the scale size. This avoids many unnecessary
cost function evaluations at low resolutions.
We initially chose Powell’s method (Press et al.,
1995) as our local optimization method as it was effi-
cient and did not require gradients to be calculated
which are especially difficult given the apodizations
applied to the cost functions. However, we discovered
that a set of N 1D golden searches (Press et al., 1995)
gave equally good results, which can be reasonably
expected if the parameterization is close to decoupled.
Global search.Toestimatethefinal transformation
sufficiently accurately, a brute-force search of the
transformation space is infeasible, even for rigid-body
transformations. However, at the lowest resolution
(8-mm scale) only the gross image features still exist
and so a coarse search of the cost function at this
resolution should reflect the major changes in rotation,
translation, and global scaling, allowing large misreg-
istrations to be avoided.
Speed remains an issue, even for coarse searches at
low resolution. Therefore, thesearch is restricted tothe
rotation parameters, as these are the most difficult to
find and are the cause of many large misregistrations.
Furthermore, the search is divided into three stages:
1. a coarse search over the rotation parameters with
a full local optimization of translation and global scale
forn each rotation tried;
2. a finer search over rotation parameters, but with
only a single cost function evaluation at each rotation
3. a full local optimization (rotation, translation,
and global scale) for each local minimum detected from
the previous stage.
The first of these stages is straightforward. Given a
set of rotations totry (by default we use 60° increments
in each of the Euler angles, leading to 63? 216 differ-
ent rotations), the local optimization routine is called
for the translation and global scale only. That is, the
rotation is left fixed, and the best translation and
global scale for this particular rotation is found. These
results are then stored for use in the later stages.
The second stage takes a larger set of rotations (by
default we use 18° degree increments, leading to203?
8000 different rotations) but only evaluates the cost
function once for each rotation. This contrasts with the
previous stagewherethecost function is typically eval-
uated between 10 and 30 times during the local opti-
mization. However, in order for the evaluation to be a
reasonable estimate of the best cost function with this
rotation, the translation and global scale parameters
must be close to the optimal values. These parameter
values are supplied from the results of the previous
stage, with the translation parameters determined by
interpolating between the stored translation values.
Global scale is fixed at the median global scale value
over all the stored values. This is done differently from
ROBUST AND ACCURATE REGISTRATION
the translation as the scale should not vary greatly
with rotation, whereas the translation is highly cou-
pled with the rotation values.
Finally, the third stage applies a full local optimiza-
tion, allowing rotation to vary, at each local minimum
detected from the results of the previous stage. These
local minima are defined as rotations where the cost
value found is less than for any of the “neighboring”
rotation values. There are often several such local min-
ima, and rather than force the selection of the best one
at this stage, they are all optimized and passed onto
the higher resolution stages.
Although it is unlikely that the first stage in this
process will get very close to the correct rotation, the
second stage should get close enough for the local op-
timization in the last stage to give a good estimate.
Note that in most registration methods there is no
equivalent of this search and a single local optimiza-
tion is performed with the starting point being no ro-
tation, no translation, and unity scaling (the identity
transformation). As this method examines many more
possible starting transformations, one cannot do any
worse than these simple methods.
Multistart optimization with perturbations.
lowing the previous search stage (at 8-mm scale) there
areusually several local minima selected as candidates
for initializing more detailed searches for the global
minimum. This stage (at 4-mm scale) performs a local
optimization for the best of these candidate transfor-
mations. In addition, it takes several perturbations of
the candidate transformations and performs local op-
timization of these perturbations. Finally, the single
best (minimum cost) solution is selected from among
these optimization results.
In practice, the three best candidates are taken from
the previous stage, together with 10 perturbations of
each candidate. Two perturbations are applied to each
rotation parameter, each being half the magnitude of
the fine search step size from the previous stage. As
well as these 6 rotational perturbations, 4 perturba-
tions in scale (?0.1, ?0.2) are also applied. The num-
ber and size of the perturbations used are arbitrary,
with these values chosen here being selected largely
from experiencewith themagnitudeand typeof typical
This approach of trying several candidate solutions
is effectively a multistart strategy similar to that used
in genetic algorithms and other global optimization
methods. Furthermore, the use of “local” perturbations
is similar to the way in which alternatives are gener-
ated in simulated annealing. It has been found empir-
ically that thecombination of thesestrategies, together
with the initial search, avoids getting trapped in local
minima to a much greater extent than for a local opti-
mization method alonewithin a multiresolution frame-
Higher resolution stages.
thesinglebest candidatetransformation is chosen, and
it is only this transformation which is worked with
from here on. At the 2-mm scale the skews and aniso-
tropic scalings start to become significant. Conse-
quently, these extra degrees of freedom (DOF) are pro-
gressively introduced by calling the local optimization
method three times: first using only 7 DOF (rigid-
body ? global scale), then with 9 DOF (rigid-body ?
independent scalings), then with thefull 12 DOF (rigid-
body ? scales ? skews).
Since the cost function evaluations take 8 times
longer at the 1-mm scale than at the 2-mm scale and
512 times longer than at the 8-mm scale, only a single
pass of thelocal optimization is doneat the1-mm scale.
The result of this single pass is returned as the regis-
tration solution, T*.
Following the 4-mm scale
In broad terms, a motion correction algorithm must
take a time series of fMRI images and register each
image in the series toa reference image. This reference
image may be of a different modality (Biswal and
Hyde, 1997) but a more common approach is to select
one image from the time-series itself (usually the
first—c.f., SPM (Friston et al., 1996)) and register the
remaining images to this template image.
If we make the reasonable assumption that there is
unlikely to be large motion from one image to the next
(usually 3 s or less between images), we can use the
result of one image’s registration as an initial guess for
the next image in the series. This is accomplished by
assuming an initial identity transformation between
the middle image Vn in a time-series and the next
adjacent image Vn?1 and then finding the optimal
transformation T1by optimizing the cost function. The
resulting solution is then used as a starting point for
F IG. 6. MCFLIRT schedule.
J ENKINSON ET AL.
the next optimization with the next image pair Vn, Vn?2
(see Fig. 6). This is only done at the lowest resolution,
as all higher resolutions use the transformations found
at the next lower resolution for the initial estimates.
The final schedule carries out the following steps on
the uncorrected data (optional stages are shown in
● 8-mm optimization using the middle image as ini-
tial reference and then using each result to initialize
the next pairwise optimization;
● 4-mm optimization using the middle image as ref-
erence and 8-mm stage results as initialization param-
● 4-mm optimization (lower tolerance) using the
middle image as reference and 4-mm stage results as
● mean registration option:
Apply transformation parameters from high-toler-
ance 4-mm stage;
Average corrected images to generate mean tem-
Carry out 8-, 4-, and 4-mm (high-tolerance) opti-
mizations as before but against mean image as refer-
● since registration option:
Carry out additional 4-mm (high-tolerance) opti-
mization using sinc interpolation (instead of trilinear
as used in previous stages);
● apply current transformation parameters touncor-
rected data and save.
As the intensity values are of great interest after
motion correction, attention must be paid to not only
the estimation but also the application of the transfor-
mation. Interpolation probably has the largest impact
on the quality of the transformed data, with sinc inter-
polation methods often being used, although no abso-
lute consensus on the best method exists. However, the
loss of information outside the FOV, usually seen in
the end slices, can alsobe very detrimental tothe final
statistical maps in these areas.
Our motion correction implementation has alsobeen
designed tohandle the potentially problematic issue of
end-slice interpolation. It is frequently the case that
under even small affine motion of the head, voxels in
the top and bottom slices can move either in or out of
the field of view (see Fig. 7). Other schemes approach
this by assuming that all affected voxels areeither zero
(AIR) or can be completely excluded from further cal-
culations (SPM). This clearly impacts later analysis as
valuable spatial information may be lost.
We counter this situation by padding the end-slices
when applying the estimated transformation (i.e, in-
creasing the extent of each volume by two slices). This
means that if data are to be interpolated from outside
the FOV, it will take on “sensible” values (personal
communication, Roger Woods, 1999).
This section presents several experiments that dem-
onstrate the robustness and accuracy of the proposed
registration and motion correction method. We begin
by first stating the implementational choices made as
these are often critical in creating a stable method that
performs well. Following this, we present the experi-
ments for registration which clearly demonstrate the
improved robustness, and the following sections dis-
cuss motion correction and atrophy estimation, demon-
strating the improved accuracy.
Implementation: FLIRT and MCFLIRT
The registration and motion correction methods de-
scribed in the previous sections have been imple-
mented in C?? and arecalled FLIRT (FMRIB’s2linear
image registration tool) and MCFLIRT (motion correc-
tion FLIRT). In each case several implementation
choices needed to be made to obtain a robust, working
method. The more important choices are: (1) the use of
center of mass as the center of transformation (also
used for initial alignment); (2) the parameterization of
the transformations as three Euler angles, three trans-
lations, three scales, and three skews; and (3) the num-
ber of intensity histogram bins set to256 divided by the
scale size (i.e., 256 for 1-mm scaling but only 32 for
8-mm scaling) since the number of voxels (samples) is
small for large scalings and sofewer bins must be used
in order toget reliable statistics (Izenman, 1991). Each
of these choices is detailed more fully in J enkinson and
Robustness Assessment: Registration
in practice, there is no ground truth available with
For many registration problems
2The Oxford Centre for Functional MRI of the Brain.
F IG. 7.End-slice correction.
ROBUST AND ACCURATE REGISTRATION
which to evaluate the registration. This makes the
quantitative assessment of methods quite difficult.
Therefore, totest the method quantitatively, a compar-
ative consistency test was performed that does not
require knowledge of the actual ground truth.
The consistency test is based on comparing registra-
tions obtained using various different, but known, ini-
tial starting positions of a given image. If the registra-
tions areconsistent then thefinal registered imagewill
be the same, regardless of the starting position. Con-
sistency is a necessary, but not sufficient condition that
all correctly functioning registration methods must
possess. This is essentially a measureof therobustness
rather than the accuracy (West et al., 1997) of the
registration method. Robustness is defined here as the
ability to get close to the global minimum on all trials,
whereas accuracy is the ability to precisely locate a
(possibly local) minimum of the cost function. Ideally, a
registration method should be both robust and accu-
More specifically, the consistency test for an individ-
ual image I involved taking the image and applying
several predetermined affine transformations, Ajto it
(with appropriate cropping so that no “padding” of the
images was required). All these images (both trans-
formed and untransformed) were registered to a given
reference image, Ir, giving transformations Tj. If the
method was consistent the composite transformations
Tj? Ajshould all be the same, as illustrated in Fig. 8.
The transformations are compared quantitatively
using theRMS deviation between thecompositeregistra-
tion Tj? Aj and the registration from the untrans-
formed case T0. This RMS deviation is calculated
directly from the affine matrices (J enkinson, 1999).
5R2Tr(M?M) ? t?t , (6)
where dRMSis the RMS deviation in mm, R is a radius
specifying the volume of interest, and ?M
Tj? Aj? T0
and the 3 ? 1 vector t.
Comparison with existing methods.
of FLIRT with several other registration packages was
initially performed using the consistency test ex-
plained above. The other registration packages used
were AIR (Woods et al., 1993), SPM (Friston et al.,
1995), UMDS (Studholme et al., 1996), and MRI-
TOTAL (Collins et al., 1994). These methods were cho-
sen because the authors’ implementations were avail-
?1is used to calculate the 3 ? 3 matrix M
F IG. 8.Illustration of the consistency test.
F IG. 9.
standard image (the reference image) overlayed on the transformed initial image.
Example slices from one of the images used in the consistency study (after registration). The red lines represent edges from the
J ENKINSON ET AL.
able, and so this constituted a fair test as opposed to a
reimplementation of a method described in a paper,
where often the lack of precise implementation details
makes it difficult to produce a good working method.
The particular experiment that was performed was
intersubject and intermodal using 18 different images
as the floating images (like the one shown in Fig. 9), all
with the MNI 305 brain (Collins et al., 1994) as the
reference image. The 18 images were all 256 ? 256 ?
30, T2-weighted MR images with voxel dimensions of
0.93 by 0.93 by 5 mm, while the MNI 305 template is a
172 ? 220 ? 156, T1-weighted MR image with voxel
dimensions of 1 by 1 by 1 mm.
The results of one such test, using six different rota-
tions about the anterior–posterior axis, are shown in
Fig. 10. It can be seen that only FLIRT and MRITO-
TAL performed consistently. This indicates that the
other methods (AIR, SPM, and UMDS) frequently get
trapped in local minima, i.e., are not as robust.
A further consistency test was then performed com-
paring only MRITOTAL and FLIRT. This test used
initial scalings rather than rotations. The reason that
this is important is that MRITOTAL uses a multireso-
lution local optimization method (gradient descent) but
relies on initial preprocessing to provide a good start-
ing position. This preprocessing is done by finding the
principle axes of both images and initially aligning
them. Consequently the initial alignment compensates
for rotations but does not give any information, and
hence correction, for scalings.
The results of the scaling consistency test are shown
in Fig. 11. It can be seen that, although generally
consistent, in three cases MRITOTAL produces regis-
trations that deviate by more than 20 mm (RMS) from
each other. In contrast, FLIRT was consistent (less
than 2 mm RMS) in all cases.
Accuracy Assessment: Motion Correction
This section details the comparative accuracy of the
motion correction scheme (MCFLIRT) when tested
against twoof the most widely used schemes, SPM and
In order toattempt toestablish a “gold standard” for
registration accuracy, initial tests use the RMS mea-
sure (Eq. 6) in combination with synthetic data, where
the exact value of the motion is known, in order to
quantify the scheme’s ability to correct for subject mo-
F IG. 10. Results of the consistency study.
ROBUST AND ACCURATE REGISTRATION
tion. Later tests characterize the degree of correction
by examining the residual variation (collapsed across
time) in the corrected data. This measure is also used
totest thescheme’s effectiveness on real data wherewe
have no absolute measure of the subject’s movement.
Simulated data.The artificial data enabling gold
standard comparisons were generated as follows: a
high-resolution EPI volume (2 ? 2 ? 2 mm) was du-
plicated 180 times and each volume was transformed
by an affine matrix corresponding to real motion esti-
mates taken from one of two studies where the subject
had been asked to move his or her head appreciably
during the scan. Three further groups of images were
generated using motion estimates from experiments
where the subject had been asked to remain as still as
possible. Within these five motion designs, three fur-
ther groups of data were created corresponding to au-
diovisual activation at 0, 2.5, and 5% of the overall
voxel intensities by modulating the intensity values
according toa mask derived from real fMRI data. Once
the activation (if any) had been applied and the vol-
umes transformed by the corresponding parameters,
the data were subsampled to4 ? 4 ? 6 mm voxels and
appropriately cropped to avoid introducing any pad-
ding voxels. The use of a high-resolution template im-
age which is then subsampled should minimize the
effect of interpolation when applying such transforma-
tions to the data.
Within our correction scheme, there are a number of
stages which can be tuned to optimize the accuracy of
the correction. The remainder of this section aims to
find a robust set of parameters which give consistently
accurate results on all data presented. We begin by
examining the comparative accuracy of several cost
functions which can be used with our optimization
scheme. Later we proceed toexamine the impact made
by the choice of interpolation scheme and registration
Cost functions. The test results shown in Fig. 12
show the relative accuracy of the available cost func-
tions within the MCFLIRT optimization framework
when applied to the problem of motion correction on
our synthetic data.
Although there is no clear leader over all cost func-
tions in terms of accuracy, we note that the most
accurate results are predominantly yielded by the nor-
malized correlation and correlation ratio cost func-
tions. This observation is reinforced when we examine
the number of data sets where a particular cost func-
tion is most accurate. This is summarized in Table 2.
Note that previous work (Freire and Mangin, 2001)
which had demonstrated the superiority of entropy-
based cost measures over alternatives in terms of mo-
tion correction without introducing further spurious
activations in the data has only compared mutual in-
formation metrics against least squares (SPM) and
Woods (AIR) measures.
The next stage of testing was toverify that these cost
functions were in fact more accurate when smoothed
(apodized) than unsmoothed (unapodized). The same
F IG. 11. Results of the scale consistency study.
F IG. 12.
of one of five designs and audiovisual activation at increasing intensities. Cost function notation corresponds to Table 1.
Median (over time) RMS (over space) error results for the MCFLIRT scheme applied tosynthetic data exhibiting known motion
J ENKINSON ET AL.
RMS test measure and data sets were used as in the
previous test and results are given in Fig. 13.
Overall, the smoothed cost functions outperform
their unsmoothed versions.
Interpolation scheme. To further improve the accu-
racy of the motion estimates, the next parameter we
experimented with was the choice of interpolation
scheme for the motion estimation. In addition to the
standard trilinear scheme, a windowed-sinc interpola-
tion (using a Hanning window of size 7 ? 7 ? 7) was
tried. While considerably slower than trilinear inter-
polation, the sinc approach is able to further refine
motion estimates after the initial trilinear stage has
converged on a solution, thus providing greater accu-
racy. The results in Fig. 14 show the greater degree of
accuracy achieved over using trilinear interpolation
alone. Note that on the third data set (cropped toallow
distinction between the other four sets), the improve-
ment was consistently over a value of 2.0.
Choice of template image.
crease the accuracy of the scheme and as a final pa-
rameter investigation, a method using a mean image
template was implemented. This scheme generates a
mean image for the series by averaging all the volumes
over time after the first three stages of trilinear inter-
polation-based motion correction have been carried
out. In doing sowehopetoberegistering all volumes to
a more generalized target which exhibits less overall
variation from each volume in the series than the orig-
inal target (middle) volume previously used. This new
mean image is a robust target to which the original
time series is then registered, again using three trilin-
ear interpolation stages and an optional final sinc in-
Because we are registering to a mean image, we no
longer have gold standard values for the transforma-
tions found by the correction scheme. Therefore, to
quantify the accuracy of the correction, a median ab-
solute residual variation (MARV) score was created by
initially demeaning each voxel time-series and then
measuring the median value of the residual absolute
values in this time-series. That is,
In an attempt to in-
M ARV (x, y, z) ??
?It?x, y, z? ? Imean?x, y, z??/N .
T ABL E 2
Accuracy Counts for the Five Cost Functions Offered
No. of sets
No. of sets
Normalized mutual information
F IG. 13.
smoothed minus smoothed) for the MCFLIRT scheme applied to five
synthetic data sets (A–E) exhibiting known motion at increasing
intensities. A positive value indicates improved accuracy as a result
of smoothing the cost function. Cost function notation corresponds to
Table 1 and the results demonstrate the improvement in accuracy
achieved by using the smoothed cost functions.
Median (over time) RMS (over space) error results (un-
F IG. 14.
the MCFLIRT scheme applied to synthetic data exhibiting known
motion of one of five designs and audiovisual activation at increasing
intensities. A positive value indicates improved accuracy as a result
of incorporating the final sinc interpolation stage. Cost function
notation corresponds to Table 1 and demonstrates the improvement
in accuracy achieved by using smoothed cost functions and addi-
tional sinc interpolation when compared to the basic trilinear
scheme reported in Fig. 13.
Median (over time) RMS (over space) error results for
ROBUST AND ACCURATE REGISTRATION
This produces a volume of MARV scores for each
voxel and the median of these values (over the volume)
is then taken as a summary measure. This is effec-
tively a measure characterizing the level of intervol-
ume intensity variation (presumed tobe due tosubject
motion) after retrospective motion correction has been
applied. While this can only work for activation-free
data (so that in perfect alignment the variance should
beat minimum), it can giveus a clear impression of the
accuracy of the motion correction scheme. Because
SPM rejects information outside a mask obtained from
the data (end-slice effects), the corrected median im-
ages were masked according tothe corrected SPM data
so that the measure reflected a consistent comparison
across the schemes. The results shown in Fig. 15 cor-
respond to the MARV values generated after running
MCFLIRT and SPM on the null-activation data set for
both the low and the severe motion designs.
Results using the RMS measure (Table 3) revealed
that, although all three schemes provide subvoxel ac-
curacy, AIR 3.08 using least squares (which we found
to give better results than the standard AIR measure)
and windowed sinc interpolation was almost an order
of magnitude worse than basic three-stage trilinear
MCFLIRT. Accordingly, we decided not to compare it
further. However, we note that AIR was primarily de-
signed to solve several different registration problems
that arise in tomographic data sets (Woods, 1998)
rather than optimized for FMRI motion correction.
From these results we conclude that for some cases
(generally the low motion data), MCFLIRT with the
correlation ratio cost function produces significantly
smaller errors than SPM99, while in other cases (some
of the high motion data) both methods give similar
results. This can be seen be comparing the heights
(MARV values) of the SPM bars with the CR bars
(typically the best MCFLIRT cost function) where a 30
to 40% reduction can be seen in the first, second, and
Slightly surprisingly, we found that the use of a
mean image template gave no discernable improve-
ment in accuracy. We conclude that for artificial data
where the motion is purely rigid, there is noadvantage
to using an (possibly blurred) average image over an
image from the original data. We would expect that the
mean template scheme could yield greater accuracy
where the data includes some physical motion-induced
artifacts and the choice of a reference image from the
original data set is not so obvious.
Null data study. Having established the accuracy
of MCFLIRT on artificial data, we ran both our scheme
and SPM99 on a number of real fMRI studies. In all
instances, the subjects had been exposed to no stimu-
lus (null data). The underlying assumption was that
after motion correction on a null data study, we would
expect theoverall variation of thedata tobelower than
before correction as subject motion-induced variablity
had been minimized. Again, results were masked ac-
cording to the SPM data to give a fair comparison.
Results, given in Fig. 16, show considerable changes
induced by SPM (both beneficial and detrimental) but
only minimal changes induced by MCFLIRT. This is
due to the fact that the amount of actual motion that
occurred in these studies is very low, so that the
F IG. 15.
corrected data processed by different motion correction schemes:
Uncorrected, MCFLIRT w. CR, MCFLIRT w. NC, MCFLIRT w. CR
and mean, MCFLIRT w. NC and mean, SPM99 run at full quality,
with sinc interpolation and interpolation error adjustment.
Median absolute residual variation (MARV) values for
T ABL E 3
RMS Deviation Values for Synthetic Null Data
UncorrectedAIR SPM MCFLIRT
Sum of squared
RMS error (mm)
F IG. 16.
comparing MCFLIRT w. CR, and SPM99 correction using full qual-
ity, sinc interpolation, and interpolation error adjustment. The re-
sults show the range of percentage changes in residual variance as a
result of the motion correction (i.e., comparing it to the uncorrected
Summary MARV statistics for three sets of real data,
J ENKINSON ET AL.
changes in intensity in each voxel (as a result of motion
correction) are also low—in fact, lower than the ex-
pected changes induced by physiological processes.
Consequently, these test results are inconclusive since
the test measure does not purely measure motion-in-
duced changes but also physiological changes which
dominate in this case. The only way one could expect to
obtain quantitative analysis for real data would be to
incorporate some form of position measurement into
the scanner—a facility not available to us.
Real activation study.As we have described above,
it is difficult to make accurate measurements pertain-
ing tothe accuracy of a motion correction scheme when
presented with real data. With data exhibiting activa-
tion, examination of the time-series after correction
using an animation tool reveals no visible extreme
affine movement although some motion artifacts re-
main. In particular, we are able to show good localiza-
tion of activations which would not be possible without
motion correction first being carried out. The thresh-
olded statistical maps shown in Fig. 17 correspond toa
180 volume audiovisual experiment. Analysis was car-
ried out using FEAT, FMRIBs easy analysis tool using
an improved linear model (Woolrich et al., 2001). In
order to test the effectiveness of MCFLIRT on real
data, the subject was asked to move his or her head
during the experiment.
It can be seen, by comparing the activations from the
uncorrected and corrected data sets, that in the uncor-
rected set a largenumber of falsepositives exist in both
visual and auditory results. Whileit can beargued that
both data sets are highly corrupted by this large mo-
tion (indeed, even the corrected data set still exhibits
some visible movement, albeit at a significantly
smaller scale than the uncorrected data), the MCF-
LIRT-corrected visual stimulus is well localized and
allows an otherwise corrupted set of experimental data
to yield potentially useful results.
This paper has examined the problem of optimiza-
tion for fully automatic registration of brain images. In
particular, the problem of avoiding local minima is
addressed in two ways. First, a general apodization
(smoothing) method was formulated for cost functions
in order to eliminate small discontinuities formed by
discontinuous changes of the number of voxels in the
overlapping field of view with changing transformation
parameters. Second, a novel hybrid global–local opti-
F IG. 17.Results of a FEAT analysis on motion-corrected audiovisual data.
ROBUST AND ACCURATE REGISTRATION