Page 1

Improved Optimization for the Robust and Accurate Linear Registration

and Motion Correction of Brain Images

Mark J enkinson,* Peter Bannister,*,† Michael Brady,† and Stephen Smith*

*Oxford Centre for Functional Magnetic Resonance Imaging of the Brain, J ohn Radcliffe Hospital, Headington, Oxford OX3 9DU; and

†Medical Vision Laboratory, Department of Engineering Science, University of Oxford, Parks Road, Oxford OXI 3PJ , United Kingdom

Received September 19, 2001

L inear registration and motion correction are im-

portant components of structural and functional brain

image analysis. Most modern methods optimize some

intensity-based cost function to determine the best

registration. T o date, little attention has been focused

on the optimization method itself, even though the

success of most registration methods hinges on the

quality of this optimization. T his paper examines the

optimization process in detail and demonstrates that

the commonly used multiresolution local optimization

methods can, and do, get trapped in local minima. T o

address this problem, two approaches are taken: (1) to

apodize the cost function and (2) to employ a novel

hybrid global–local optimization method. T his new op-

timization method is specifically designed for register-

ing whole brain images. It substantially reduces the

likelihood of producing misregistrations due to being

trapped by local minima. T he increased robustness of

the method, compared to other commonly used meth-

ods, is demonstrated by a consistency test. In addition,

the accuracy of the registration is demonstrated by a

series of experiments with motion correction. T hese

motion correction experiments also investigate how

the results are affected by different cost functions and

interpolation methods.

Key Words: accuracy; affine transformation; global

optimization; motion correction; multimodal registra-

tion; multiresolution search; robustness.

© 2002 E lsevier Science (USA)

INTRODUCTION

Geometric registration and motion correction areim-

portant stages in the analysis of functional brain im-

aging studies. Consequently, it is important that these

stages perform robustly and accurately. Furthermore,

for large imaging studies it is desirable that they be

fully automated.

There has been a considerable amount of research

intoregistration and motion correction of brain images,

and many different methods have been proposed

(Maintz and Viergever, 1998). Most methods in com-

mon usage are based on the mathematical framework

of optimizing an intensity-based cost function. How-

ever, although much work has concentrated on how the

choiceof cost function affects registration performance,

there has been far less examination of the effect of the

optimization method. Moreover, when optimization

methods are discussed, global methods are often ig-

nored and local methods compared purely on the basis

of speed (Maes et al., 1999).

One of the most common and serious problems for

registration methods is the presence of local minima in

the cost function. These cause local optimization meth-

ods to “get stuck” and hence to fail to find the desired

the global minimum. Most registration methods at-

tempt to solve this problem by incorporating a local

optimization strategy within a multiresolution frame-

work. Such a multiresolution framework, which typi-

cally involves starting with low-resolution images (con-

taining only gross features) and working progressively

through to higher resolutions, aims to avoid the local

minima “traps.” As we show later, this simple mul-

tiresolution approach is not always sufficient for avoid-

ing local minima, and by using more sophisticated op-

timization methods, the chances of becoming “trapped”

in these local minima can be substantially reduced.

Two types of local minima commonly occur for the

cost functions used in image registration: large-scale

basins and small-scale dips. The first type, the large-

scale basin, is responsible for large misregistrations

since the local minimum is often far from the global

minimum. The second type, small-scale dips, can cause

the optimization to get stuck at any stage and so are

responsible for large misregistrations at low resolu-

tions and small misregistrations at high resolutions.

We propose two methods for dealing with the local

minima problem. These are cost function apodization,

which reduces or eliminates small-scale dips, and a

hybrid global–local optimization technique which uti-

lizes prior knowledge about brain registration tocreate

an optimization technique that combines the speed of

local optimization with the robustness of global opti-

mization.

NeuroImage 17, 825–841 (2002)

doi:10.1006/nimg.2002.1132

825

1053-8119/02 $35.00

© 2002 Elsevier Science (USA)

All rights reserved.

Page 2

The following sections of this paper are background

theory, methods (including both cost function apodiza-

tion and the hybrid optimization method), results, and

discussion. The results section contains a number of

experiments on real, whole brain images which dem-

onstrate the effectiveness of the registration in two

different settings: (1) structural image registration (in-

termodal/intersubject) of an anatomical image to a

standard template; and (2) functional image motion

correction (intramodal/intrasubject) which registers

each image in a time-series to a particular example

image from that time-series. The first case is examined

using a robustness study (as accuracy is hard to define

for intersubject registration, and robustness is a more

important issue in this context), while the second case

is examined using an accuracy study (as, in this con-

text, it is accuracy that is more important). In each

case real brain image data are used. Comparisons with

some commonly used methods are also included (in

both cases) which demonstrate the superior robustness

and accuracy which can be obtained using this ap-

proach.

MATERIALS

Registration

The registration problem studied here is to find the

best geometric alignment of two(volumetric brain) im-

ages. Call the twoimages the reference (Y) and floating

(X) images. More precisely, the registration problem

seeks that transformation which, when applied to the

floating image, maximizes the “similarity” between

this transformed floating image and the reference im-

age.

A standard, and common, way of formulating this as

a mathematical problem is to construct a cost function

which quantifies the dissimilarity between two images

and then search for the transformation (T*) which

gives the minimum cost. In mathematical notation this

is

T * ? arg min

T?STC?Y,T?X?? ,(1)

where ST is the space of allowable transformations,

C(I1, I2) is the cost function, and T(X) represents the

image X after it has been transformed by the transfor-

mation T. In this paper we shall only consider linear

registration so that ST is either the set of all affine

transformations or some subset of this (such as the set

of all rigid-body transformations).

Cost Function

Many different cost functions have been proposed for

image registration problems. Some use geometrically

defined features, found within the image, to quantify

the (dis)similarity, while others work directly with the

intensity values in the images. A large comparative

study of different registration methods (West et al.,

1997) indicated that intensity-based cost functions are

more accurate and reliable than the geometrically

based ones. Consequently, most recent registration

methods have used intensity-based cost functions, and

these are the ones which will be discussed in this

paper.

Intensity-based cost functions can be divided natu-

rally into two categories: those suitable for intramodal

problems and those suitable for intermodal problems.

In the former category the most commonly used cost

functions are least squares (LS) and normalized corre-

lation (NC). For the latter, and more difficult, category

the most commonly used functions are mutual infor-

mation (MI), normalized mutual information (NMI),

woods (W), and correlation ratio (CR). These functions

are defined mathematically in Table 1 (see (J enkinson

and Smith, 2001) for more information).

Interpolation.In addition to a pair of images and a

T ABL E 1

Mathematical Definitions of the Most Commonly Used In-

tensity-Based Cost Functions: least squares (LS); normalized

correlation (NC); Woods (W); correlation ratio (CR); mutual

information (MI); and normalized mutual information (NMI)

Cost

function DefinitionMinimum Maximum

CLS

¥(Y ? X)2

0

?

CNC

¥?X ?Y ?

?¥X

2?¥Y2

?11

CW

¥k

nk

N

?V ar(Yk)

?(Yk)

0

?

CCR

1

Var?Y ?¥k

nk

NVar(Yk)

01

CMI

H(X, Y) ? H(X) ? H(Y)

??

0

CNMI

H(X ,Y )

H(X)?H(Y)01

Note. The notation is as follows: quantities X and Y denote images,

each represented as a set of intensities; ?(A) is the mean of set A;

Var(A) is the variance of the set A; Ykis the kth iso-set defined as the

set of intensities in image Y at positions where the intensity in X is

in the kth intensity bin; nkis the number of elements in the set Yk

such that N ? ¥knk; H(X,Y) ? ?¥ijpijlog pijis the standard entropy

definition where pijrepresents the probability estimated using the (i,

j) joint histogram bin, and similarly for the marginals, H(X) and

H(Y). Note that the sums in the first two rows are taken over all

corresponding voxels.

826

J ENKINSON ET AL.

Page 3

particular transformation, the cost function requires

that a method of interpolation be defined, that is, some

method of calculating what the intensity is in the float-

ing image at points in between the original voxel (or

grid) locations. This is necessary in order to know the

intensity at corresponding points in the images after

the geometrical transformation has been applied tothe

floating image.

Interpolation methods that are commonly used are

trilinear (also called linear or, in 2D, bilinear), nearest

neighbor, sinc (of various kernel sizes and with or

without various windowing functions; e.g., Blackman),

spline, and Fourier. The choice of method has some

impact on cost function smoothness, although all inter-

polation methods except nearest neighbor are continu-

ous. However, the choice of method becomes most crit-

ical for motion correction as the transformed image

intensities are needed for later statistical analysis.

Optimization

Once a cost function has been chosen it is necessary

to search for the transformation which will yield the

minimum cost value. To do this, an optimization

method is used which searches through the parameter

space of allowable transformations. Note that rigid-

body transformations are specified by 6 parameters (3

rotations and 3 translations) while affine transforma-

tions are specified by 12 parameters. Consequently,

even for linear transformation, the optimization takes

place in a high dimensional space; R

12.

While the problem specified in Eq. (1) is a global

optimization, quite often local optimization methods

areemployed as they aresimpler and faster.1However,

this can result in the method returning a transforma-

tion that corresponds to a local minimum of the cost

function, rather than the desired global minimum.

Such cases often appear as misregistrations, of varying

severity, and are a major cause of registration failure.

Unfortunately, there are very few global optimiza-

tion methods that are suitable for a 3D brain image

registration problem. This is because, in terms of op-

erations, the cost function is expensive toevaluate and

most global optimization methods require a great

many evaluations leading to unacceptable execution

times (e.g., days).

Multiresolution techniques.

optimization process and avoid local minima, most cur-

rently used registration methods employ some form of

multiresolution optimization. That is, a sequence of

n, where 6 ? n ?

To both speed up the

image pairs, at progressively larger spatial scales, is

created from the initial pair of images: (Ir, If). The

images at larger scales are subsampled versions (often

with preblurring) of the original high-resolution im-

ages and so contain fewer voxels which means that

evaluating the cost function requires less computation.

In addition, as only gross features of theimages remain

at theselargescales, it is hoped that therewill befewer

local minima for the optimization to get stuck in.

Motion Correction

In functional brain imaging a series of brain images

is acquired. The time elapsed between each acquisition

is usually a few seconds or less. Due to the small

acquisition times required, these images usually have

poor resolution. Furthermore, as the imaging parame-

ters are tuned to highlight physiological changes (e.g.,

blood oxygenation), the images often have poor ana-

tomical contrast.

Extracting functional information from such a series

of images is done by applying statistical time-series

analysis, which assumes that the location of a given

voxel within the brain does not changeover time. How-

ever, there is usually some degree of subject motion

within thescanner, especially when thescanning takes

a long time or when clinical patients are involved.

Therefore, in order to render the data fit for statistical

analysis this motion must be estimated and corrected

for. This is thetask of motion correction methods and it

is essentially a multiple-image registration task.

Normally, motion correction methods deal with the

registration task by selecting a reference image from

within the series and registering each image in turn to

this fixed reference. As all images are of the same

subject, using the same imaging parameters, it can be

classified as an intrasubject, intramodal registration

problem. Therefore, a rigid-body transformation space

and intramodal cost function can be used. Further-

more, as the values in the corrected images are impor-

tant for later statistical analysis, the choice of interpo-

lation method for thetransformation of theimages is of

particular importance (Hajnal et al., 1995a,b).

METHODS

Apodization of the Cost Functions

As seen in Fig. 1, the local behavior of the cost

function shows small discontinuities as the transfor-

mation parameters are varied smoothly. This creates

local minima traps for the optimization method. Since

all interpolation methods are continuous (except near-

est neighbor, which is consequently seldom used) the

discontinuities are not due to the type of interpolation

used. Thecauseof thesediscontinuities is thechanging

amount of overlap of the reference and floating image.

1A global method searches the entire range of possible param-

eters for the most optimal cost function value, while a local

method simply starts somewhere in parameter space and moves

about locally trying to find an optimum, stopping when there is no

better nearby set of parameters.

827

ROBUST AND ACCURATE REGISTRATION

Page 4

There are two different ways of treating values out-

side the field of view (FOV) of an image:

1. Treat all values outside the FOV as zero.

2. Do all calculations strictly in the overlapping re-

gion.

The first method is undesirable as it creates artificial

intensity boundaries when the object is not wholly

contained within the FOV. However, in the second

method the number of points counted in the overlap-

ping region varies, not just the expressions involving

intensities. Therefore, in the second case both the nu-

merator and the denominator of the cost functions

(except least squares) will change discontinuously as

the amount of overlap changes.

The discontinuities exist because the images are dis-

crete sets of voxels. In particular, the reference image

defines a fixed set of voxel locations over which the cost

function is calculated. Then, for a given transforma-

tion, the floating image intensities at these locations

are calculated using interpolation. A reference image

voxel location is only counted when it is valid, that is,

within the overlapping region such that it maps to a

location insidetheFOV of thefloating image. When the

edge of the FOV of the floating image crosses a refer-

ence voxel location, the location suddenly goes from

being inside the overlapping region to outside, causing

a discontinuous change in the number of valid loca-

tions, as shown in Fig. 2.

We aim to apodize the cost function by removing

these discontinuities. To do this, our approach has

been to introduce a geometric apodization that de-

weights the contributions of locations that are near the

edgeof theoverlapping region. Theweighting is chosen

so that the contribution of such locations drops contin-

uously until it reaches zero at the edge of the overlap-

ping region. Any continuous weighting function could

be used but for simplicity and computational efficiency

we choose one that is linear.

For instance, consider a 2D example of a reference

location that maps to a point inside the overlapping

region, where the distance from the nearest edges of

the floating image FOV is dXand dYunits, as shown in

Fig. 2. In each dimension, if this value is less than

some threshold D, then the influence of that point is

weighted by a weight w ? d/D. In higher dimensions,

the product of the weighting functions in each dimen-

sion is used. That is, w(dX, dY, dZ) ? w(dX)w(dY)w(dZ).

F IG. 1.Illustration of local discontinuities in cost function plots.

F IG. 2.FOV change.

828

J ENKINSON ET AL.

Page 5

The weighting is applied to all terms involving that

location’s intensity as well as to the number of loca-

tions in the region. For example, consider the nth mo-

ment of an iso-set:

Mn?Yk? ?

1

Nk?

j?Xj?Ik

?Yj?n

(2)

Nk? ?

j?Xj?Ik

1 , (3)

where Mnis the nth moment, j is a voxel index, X and

Y represent the reference and floating images, respec-

tively, and Ikdenotes the kth intensity bin. With gen-

eral weighting this becomes

Mn?Yk? ?

1

Nk?

j?Xj?Ik

wj?Yj?n

(4)

Nk? ?

j?Xj?Ik

wj, (5)

where wj is the weight of the location j, which is 0

outside the overlapping region, d/D for d ? D or 1 for

d ? D inside the overlapping region.

This weighting scheme can be simply and efficiently

applied to any of the non-entropy-based cost functions

(i.e., LS, NC, W and CR). It depends on one parame-

ter—the threshold distance D— which can be varied to

increase the amount of apodization. When D ? 0 there

is no apodization, while increasing D creates smoother

and smoother cost functions, although thecost function

will be continuous for any nonzerovalue of D. Alsonote

that making D larger than the voxel spacing is permit-

ted and just has a greater smoothing effect, as shown

in Fig. 3.

J ointhistogram apodization.

weighting scheme is required for apodizing the joint

histogram required for the entropy-based cost func-

tions. This is because the number of entries in each

histogram bin becomes discontinuous as the intensity

at a floating image location (calculated using interpo-

lation) passes through the threshold value between

intensity bins.

As it is the intensity passing through a threshold

value that causes the discontinuities for the joint his-

togram, we propose a weighting function that is deter-

mined by the intensities and applied to every location.

We choose, once again, a linear weighting function (as

shown in Fig. 4) where wkis the weight for bin k, I is

A moregeneral

F IG. 3. Illustration of cost function apodization.

F IG. 4. Weighting function.

829

ROBUST AND ACCURATE REGISTRATION

Page 6

the intensity at the location under consideration, Tkis

the intensity threshold between bins k-1 and k, and ?

is the smoothing threshold—the equivalent of D in the

preceding section. This weight is then applied to the

accumulation of intensity within the joint histogram

bin, as well as the number of entries in the bin, which

is no longer an integer. That is, for a point of intensity

I, the updating equations for bin k are

Nk3 Nk? wk?I?

Sk3 Sk? I ? wk?I? ,

where wk( ? ) is the weighting function for the kth bin

(see Fig. 4), Nkis the occupancy of bin k (a noninteger

version of the number of elements), and Skis the sum

of intensities in the bin.

This approach is effectively fuzzy-binning, where

each intensity bin no longer has sharp thresholds, but

fuzzy membership functions. It alsomeans that a given

location can influence more than one bin entry. Be-

cause of the way the weighting function is calculated

though, each location contributes equally, since the

sum of weights for all bin entries is equal to 1.

As changing overlap will still create discontinuities,

both the geometrical weighting and the fuzzy-binning

must be applied to have a continuous joint histogram.

Moreover, the parameter ? will give a continuous joint

histogram for any value greater than zero, although

the value should not exceed the intensity bin width.

The smoothing capacity of ? is shown in Fig. 5 for the

mutual information cost function. Results for the nor-

malized mutual information cost function arevery sim-

ilar. Note that, in general, a value of ? ? 0.5 together

with D equal to the resolution scale (e.g., 8 mm) gives

a desirably smooth cost function.

Note that the partial volume interpolation intro-

duced by Maes (Maes et al., 1997) also creates contin-

uous joint histogram estimates if used in conjunction

with the geometrical weighting (applied to the refer-

ence locations instead). However, as the name sug-

gests, PVI is morethan just an apodization scheme—in

fact, it functions as an interpolation method too. There-

fore, different interpolation methods cannot be used in

conjunction with PVI, whereas for fuzzy-binning the

interpolation method used can be freely chosen. Fur-

thermore, the fuzzy-binning scheme provides an ad-

justable parameter, ?, which controls the amount of

smoothing of the cost function, allowing for different

degrees of smoothing as desired.

Finally, it can be seen that fuzzy-binning can be

made fully symmetric with respect to the two images

(see (Cachier and Rey, 2000) for a discussion of sym-

metry in general registration cost functions). That is,

both floating and reference intensities can have fuzzy-

binning applied tothem. However, there is an inherent

asymmetry in the way that interpolation is applied

only to the floating image. Therefore, although such a

symmetric approach appears initially attractive, the

F IG. 5. Illustration of cost function apodization.

830

J ENKINSON ET AL.

Page 7

simpler and faster approach of only using fuzzy bins for

the floating image was adopted in practice.

A Global–Local Hybrid Optimization Method

Of the many different approaches to global optimi-

zation, we have investigated two strategies and com-

bined them with a simple but fast local optimization

method to produce a hybrid optimization method. The

two strategies are searching and multistart optimiza-

tion.

Our hybrid optimization method (also described in

(J enkinson and Smith, 2001)) is specifically designed

for the problem at hand, using prior knowledge about

the transformation parameters and typical data size

(FOV, voxel size, etc.) to help make the method effi-

cient. The method cannot guarantee that the global

solution is found, but then neither can any other global

optimization method given a finite amount of time.

Generally, only statistical “guarantees” are given, and

these often require excessive run-times in order to be

met. In contrast, our method is designed to give a

reliable estimate of the global minimum given some

time restriction (in our case, less than 1 h on a moder-

ately powered standard workstation; e.g., registering

two 1 ? 1 ? 1 mm images typically takes 15 min on a

500-MHz Pentium III).

The method still uses a local optimization method

with a multiresolution framework, and these are de-

scribed in the next two sections, followed by descrip-

tions of the global search and multistart optimization

strategies employed.

Multiresolution. Currently, four different scales

are used in our method: 8, 4, 2, and 1 mm. At each

scale, the two images are resampled, after initial pre-

blurring (using a Gaussian with FWHM equal to the

ratio of the final and initial voxel sizes), so that they

have isotropic voxels of size equal to the scale size.

Note that an exception to this occurs if the scale is

smaller than the data resolution, in which case the

data areresampled toisotropic voxels of scaleclosest to

the data resolution.

Furthermore, skew and anisotropic scaling changes

are much less prominent than rotational, transla-

tional, and global scaling changes and so their effects

are difficult to estimate reliably at low resolutions.

Consequently, only similarity transformations (rigid-

body ? global scaling) are estimated at the 8- and

4-mm scales.

Local optimization. The choice of local optimization

method used here is not critical, except that it must be

efficient. Furthermore, since it will be used in a mul-

tiresolution framework, the low-resolution stages do

not need to find highly accurate transformations.

Therefore, the initial parameter bracketing and the

parameter tolerances (the size of uncertainty on the

optimized parameter values) are both made propor-

tional to the scale size. This avoids many unnecessary

cost function evaluations at low resolutions.

We initially chose Powell’s method (Press et al.,

1995) as our local optimization method as it was effi-

cient and did not require gradients to be calculated

which are especially difficult given the apodizations

applied to the cost functions. However, we discovered

that a set of N 1D golden searches (Press et al., 1995)

gave equally good results, which can be reasonably

expected if the parameterization is close to decoupled.

Global search.Toestimatethefinal transformation

sufficiently accurately, a brute-force search of the

transformation space is infeasible, even for rigid-body

transformations. However, at the lowest resolution

(8-mm scale) only the gross image features still exist

and so a coarse search of the cost function at this

resolution should reflect the major changes in rotation,

translation, and global scaling, allowing large misreg-

istrations to be avoided.

Speed remains an issue, even for coarse searches at

low resolution. Therefore, thesearch is restricted tothe

rotation parameters, as these are the most difficult to

find and are the cause of many large misregistrations.

Furthermore, the search is divided into three stages:

1. a coarse search over the rotation parameters with

a full local optimization of translation and global scale

forn each rotation tried;

2. a finer search over rotation parameters, but with

only a single cost function evaluation at each rotation

(for efficiency);

3. a full local optimization (rotation, translation,

and global scale) for each local minimum detected from

the previous stage.

The first of these stages is straightforward. Given a

set of rotations totry (by default we use 60° increments

in each of the Euler angles, leading to 63? 216 differ-

ent rotations), the local optimization routine is called

for the translation and global scale only. That is, the

rotation is left fixed, and the best translation and

global scale for this particular rotation is found. These

results are then stored for use in the later stages.

The second stage takes a larger set of rotations (by

default we use 18° degree increments, leading to203?

8000 different rotations) but only evaluates the cost

function once for each rotation. This contrasts with the

previous stagewherethecost function is typically eval-

uated between 10 and 30 times during the local opti-

mization. However, in order for the evaluation to be a

reasonable estimate of the best cost function with this

rotation, the translation and global scale parameters

must be close to the optimal values. These parameter

values are supplied from the results of the previous

stage, with the translation parameters determined by

interpolating between the stored translation values.

Global scale is fixed at the median global scale value

over all the stored values. This is done differently from

831

ROBUST AND ACCURATE REGISTRATION

Page 8

the translation as the scale should not vary greatly

with rotation, whereas the translation is highly cou-

pled with the rotation values.

Finally, the third stage applies a full local optimiza-

tion, allowing rotation to vary, at each local minimum

detected from the results of the previous stage. These

local minima are defined as rotations where the cost

value found is less than for any of the “neighboring”

rotation values. There are often several such local min-

ima, and rather than force the selection of the best one

at this stage, they are all optimized and passed onto

the higher resolution stages.

Although it is unlikely that the first stage in this

process will get very close to the correct rotation, the

second stage should get close enough for the local op-

timization in the last stage to give a good estimate.

Note that in most registration methods there is no

equivalent of this search and a single local optimiza-

tion is performed with the starting point being no ro-

tation, no translation, and unity scaling (the identity

transformation). As this method examines many more

possible starting transformations, one cannot do any

worse than these simple methods.

Multistart optimization with perturbations.

lowing the previous search stage (at 8-mm scale) there

areusually several local minima selected as candidates

for initializing more detailed searches for the global

minimum. This stage (at 4-mm scale) performs a local

optimization for the best of these candidate transfor-

mations. In addition, it takes several perturbations of

the candidate transformations and performs local op-

timization of these perturbations. Finally, the single

best (minimum cost) solution is selected from among

these optimization results.

In practice, the three best candidates are taken from

the previous stage, together with 10 perturbations of

each candidate. Two perturbations are applied to each

rotation parameter, each being half the magnitude of

the fine search step size from the previous stage. As

well as these 6 rotational perturbations, 4 perturba-

tions in scale (?0.1, ?0.2) are also applied. The num-

ber and size of the perturbations used are arbitrary,

with these values chosen here being selected largely

from experiencewith themagnitudeand typeof typical

misregistrations.

This approach of trying several candidate solutions

is effectively a multistart strategy similar to that used

in genetic algorithms and other global optimization

methods. Furthermore, the use of “local” perturbations

is similar to the way in which alternatives are gener-

ated in simulated annealing. It has been found empir-

ically that thecombination of thesestrategies, together

with the initial search, avoids getting trapped in local

minima to a much greater extent than for a local opti-

mization method alonewithin a multiresolution frame-

work.

Fol-

Higher resolution stages.

thesinglebest candidatetransformation is chosen, and

it is only this transformation which is worked with

from here on. At the 2-mm scale the skews and aniso-

tropic scalings start to become significant. Conse-

quently, these extra degrees of freedom (DOF) are pro-

gressively introduced by calling the local optimization

method three times: first using only 7 DOF (rigid-

body ? global scale), then with 9 DOF (rigid-body ?

independent scalings), then with thefull 12 DOF (rigid-

body ? scales ? skews).

Since the cost function evaluations take 8 times

longer at the 1-mm scale than at the 2-mm scale and

512 times longer than at the 8-mm scale, only a single

pass of thelocal optimization is doneat the1-mm scale.

The result of this single pass is returned as the regis-

tration solution, T*.

Following the 4-mm scale

Motion Correction

In broad terms, a motion correction algorithm must

take a time series of fMRI images and register each

image in the series toa reference image. This reference

image may be of a different modality (Biswal and

Hyde, 1997) but a more common approach is to select

one image from the time-series itself (usually the

first—c.f., SPM (Friston et al., 1996)) and register the

remaining images to this template image.

If we make the reasonable assumption that there is

unlikely to be large motion from one image to the next

(usually 3 s or less between images), we can use the

result of one image’s registration as an initial guess for

the next image in the series. This is accomplished by

assuming an initial identity transformation between

the middle image Vn in a time-series and the next

adjacent image Vn?1 and then finding the optimal

transformation T1by optimizing the cost function. The

resulting solution is then used as a starting point for

F IG. 6. MCFLIRT schedule.

832

J ENKINSON ET AL.

Page 9

the next optimization with the next image pair Vn, Vn?2

(see Fig. 6). This is only done at the lowest resolution,

as all higher resolutions use the transformations found

at the next lower resolution for the initial estimates.

The final schedule carries out the following steps on

the uncorrected data (optional stages are shown in

italics):

● 8-mm optimization using the middle image as ini-

tial reference and then using each result to initialize

the next pairwise optimization;

● 4-mm optimization using the middle image as ref-

erence and 8-mm stage results as initialization param-

eters;

● 4-mm optimization (lower tolerance) using the

middle image as reference and 4-mm stage results as

initialization parameters;

● mean registration option:

Apply transformation parameters from high-toler-

ance 4-mm stage;

Average corrected images to generate mean tem-

plate image;

Carry out 8-, 4-, and 4-mm (high-tolerance) opti-

mizations as before but against mean image as refer-

ence;

● since registration option:

Carry out additional 4-mm (high-tolerance) opti-

mization using sinc interpolation (instead of trilinear

as used in previous stages);

● apply current transformation parameters touncor-

rected data and save.

End Slices

As the intensity values are of great interest after

motion correction, attention must be paid to not only

the estimation but also the application of the transfor-

mation. Interpolation probably has the largest impact

on the quality of the transformed data, with sinc inter-

polation methods often being used, although no abso-

lute consensus on the best method exists. However, the

loss of information outside the FOV, usually seen in

the end slices, can alsobe very detrimental tothe final

statistical maps in these areas.

Our motion correction implementation has alsobeen

designed tohandle the potentially problematic issue of

end-slice interpolation. It is frequently the case that

under even small affine motion of the head, voxels in

the top and bottom slices can move either in or out of

the field of view (see Fig. 7). Other schemes approach

this by assuming that all affected voxels areeither zero

(AIR) or can be completely excluded from further cal-

culations (SPM). This clearly impacts later analysis as

valuable spatial information may be lost.

We counter this situation by padding the end-slices

when applying the estimated transformation (i.e, in-

creasing the extent of each volume by two slices). This

means that if data are to be interpolated from outside

the FOV, it will take on “sensible” values (personal

communication, Roger Woods, 1999).

RESULTS

This section presents several experiments that dem-

onstrate the robustness and accuracy of the proposed

registration and motion correction method. We begin

by first stating the implementational choices made as

these are often critical in creating a stable method that

performs well. Following this, we present the experi-

ments for registration which clearly demonstrate the

improved robustness, and the following sections dis-

cuss motion correction and atrophy estimation, demon-

strating the improved accuracy.

Implementation: FLIRT and MCFLIRT

The registration and motion correction methods de-

scribed in the previous sections have been imple-

mented in C?? and arecalled FLIRT (FMRIB’s2linear

image registration tool) and MCFLIRT (motion correc-

tion FLIRT). In each case several implementation

choices needed to be made to obtain a robust, working

method. The more important choices are: (1) the use of

center of mass as the center of transformation (also

used for initial alignment); (2) the parameterization of

the transformations as three Euler angles, three trans-

lations, three scales, and three skews; and (3) the num-

ber of intensity histogram bins set to256 divided by the

scale size (i.e., 256 for 1-mm scaling but only 32 for

8-mm scaling) since the number of voxels (samples) is

small for large scalings and sofewer bins must be used

in order toget reliable statistics (Izenman, 1991). Each

of these choices is detailed more fully in J enkinson and

Smith (2001).

Robustness Assessment: Registration

Consistency test.

in practice, there is no ground truth available with

For many registration problems

2The Oxford Centre for Functional MRI of the Brain.

F IG. 7.End-slice correction.

833

ROBUST AND ACCURATE REGISTRATION

Page 10

which to evaluate the registration. This makes the

quantitative assessment of methods quite difficult.

Therefore, totest the method quantitatively, a compar-

ative consistency test was performed that does not

require knowledge of the actual ground truth.

The consistency test is based on comparing registra-

tions obtained using various different, but known, ini-

tial starting positions of a given image. If the registra-

tions areconsistent then thefinal registered imagewill

be the same, regardless of the starting position. Con-

sistency is a necessary, but not sufficient condition that

all correctly functioning registration methods must

possess. This is essentially a measureof therobustness

rather than the accuracy (West et al., 1997) of the

registration method. Robustness is defined here as the

ability to get close to the global minimum on all trials,

whereas accuracy is the ability to precisely locate a

(possibly local) minimum of the cost function. Ideally, a

registration method should be both robust and accu-

rate.

More specifically, the consistency test for an individ-

ual image I involved taking the image and applying

several predetermined affine transformations, Ajto it

(with appropriate cropping so that no “padding” of the

images was required). All these images (both trans-

formed and untransformed) were registered to a given

reference image, Ir, giving transformations Tj. If the

method was consistent the composite transformations

Tj? Ajshould all be the same, as illustrated in Fig. 8.

The transformations are compared quantitatively

using theRMS deviation between thecompositeregistra-

tion Tj? Aj and the registration from the untrans-

formed case T0. This RMS deviation is calculated

directly from the affine matrices (J enkinson, 1999).

That is,

dRMS? ?

1

5R2Tr(M?M) ? t?t , (6)

where dRMSis the RMS deviation in mm, R is a radius

specifying the volume of interest, and ?M

Tj? Aj? T0

and the 3 ? 1 vector t.

Comparison with existing methods.

of FLIRT with several other registration packages was

initially performed using the consistency test ex-

plained above. The other registration packages used

were AIR (Woods et al., 1993), SPM (Friston et al.,

1995), UMDS (Studholme et al., 1996), and MRI-

TOTAL (Collins et al., 1994). These methods were cho-

sen because the authors’ implementations were avail-

t

00? ?

?1is used to calculate the 3 ? 3 matrix M

A comparison

F IG. 8.Illustration of the consistency test.

F IG. 9.

standard image (the reference image) overlayed on the transformed initial image.

Example slices from one of the images used in the consistency study (after registration). The red lines represent edges from the

834

J ENKINSON ET AL.

Page 11

able, and so this constituted a fair test as opposed to a

reimplementation of a method described in a paper,

where often the lack of precise implementation details

makes it difficult to produce a good working method.

The particular experiment that was performed was

intersubject and intermodal using 18 different images

as the floating images (like the one shown in Fig. 9), all

with the MNI 305 brain (Collins et al., 1994) as the

reference image. The 18 images were all 256 ? 256 ?

30, T2-weighted MR images with voxel dimensions of

0.93 by 0.93 by 5 mm, while the MNI 305 template is a

172 ? 220 ? 156, T1-weighted MR image with voxel

dimensions of 1 by 1 by 1 mm.

The results of one such test, using six different rota-

tions about the anterior–posterior axis, are shown in

Fig. 10. It can be seen that only FLIRT and MRITO-

TAL performed consistently. This indicates that the

other methods (AIR, SPM, and UMDS) frequently get

trapped in local minima, i.e., are not as robust.

A further consistency test was then performed com-

paring only MRITOTAL and FLIRT. This test used

initial scalings rather than rotations. The reason that

this is important is that MRITOTAL uses a multireso-

lution local optimization method (gradient descent) but

relies on initial preprocessing to provide a good start-

ing position. This preprocessing is done by finding the

principle axes of both images and initially aligning

them. Consequently the initial alignment compensates

for rotations but does not give any information, and

hence correction, for scalings.

The results of the scaling consistency test are shown

in Fig. 11. It can be seen that, although generally

consistent, in three cases MRITOTAL produces regis-

trations that deviate by more than 20 mm (RMS) from

each other. In contrast, FLIRT was consistent (less

than 2 mm RMS) in all cases.

Accuracy Assessment: Motion Correction

This section details the comparative accuracy of the

motion correction scheme (MCFLIRT) when tested

against twoof the most widely used schemes, SPM and

AIR.

In order toattempt toestablish a “gold standard” for

registration accuracy, initial tests use the RMS mea-

sure (Eq. 6) in combination with synthetic data, where

the exact value of the motion is known, in order to

quantify the scheme’s ability to correct for subject mo-

F IG. 10. Results of the consistency study.

835

ROBUST AND ACCURATE REGISTRATION

Page 12

tion. Later tests characterize the degree of correction

by examining the residual variation (collapsed across

time) in the corrected data. This measure is also used

totest thescheme’s effectiveness on real data wherewe

have no absolute measure of the subject’s movement.

Simulated data.The artificial data enabling gold

standard comparisons were generated as follows: a

high-resolution EPI volume (2 ? 2 ? 2 mm) was du-

plicated 180 times and each volume was transformed

by an affine matrix corresponding to real motion esti-

mates taken from one of two studies where the subject

had been asked to move his or her head appreciably

during the scan. Three further groups of images were

generated using motion estimates from experiments

where the subject had been asked to remain as still as

possible. Within these five motion designs, three fur-

ther groups of data were created corresponding to au-

diovisual activation at 0, 2.5, and 5% of the overall

voxel intensities by modulating the intensity values

according toa mask derived from real fMRI data. Once

the activation (if any) had been applied and the vol-

umes transformed by the corresponding parameters,

the data were subsampled to4 ? 4 ? 6 mm voxels and

appropriately cropped to avoid introducing any pad-

ding voxels. The use of a high-resolution template im-

age which is then subsampled should minimize the

effect of interpolation when applying such transforma-

tions to the data.

Within our correction scheme, there are a number of

stages which can be tuned to optimize the accuracy of

the correction. The remainder of this section aims to

find a robust set of parameters which give consistently

accurate results on all data presented. We begin by

examining the comparative accuracy of several cost

functions which can be used with our optimization

scheme. Later we proceed toexamine the impact made

by the choice of interpolation scheme and registration

schedule.

Cost functions. The test results shown in Fig. 12

show the relative accuracy of the available cost func-

tions within the MCFLIRT optimization framework

when applied to the problem of motion correction on

our synthetic data.

Although there is no clear leader over all cost func-

tions in terms of accuracy, we note that the most

accurate results are predominantly yielded by the nor-

malized correlation and correlation ratio cost func-

tions. This observation is reinforced when we examine

the number of data sets where a particular cost func-

tion is most accurate. This is summarized in Table 2.

Note that previous work (Freire and Mangin, 2001)

which had demonstrated the superiority of entropy-

based cost measures over alternatives in terms of mo-

tion correction without introducing further spurious

activations in the data has only compared mutual in-

formation metrics against least squares (SPM) and

Woods (AIR) measures.

The next stage of testing was toverify that these cost

functions were in fact more accurate when smoothed

(apodized) than unsmoothed (unapodized). The same

F IG. 11. Results of the scale consistency study.

F IG. 12.

of one of five designs and audiovisual activation at increasing intensities. Cost function notation corresponds to Table 1.

Median (over time) RMS (over space) error results for the MCFLIRT scheme applied tosynthetic data exhibiting known motion

836

J ENKINSON ET AL.

Page 13

RMS test measure and data sets were used as in the

previous test and results are given in Fig. 13.

Overall, the smoothed cost functions outperform

their unsmoothed versions.

Interpolation scheme. To further improve the accu-

racy of the motion estimates, the next parameter we

experimented with was the choice of interpolation

scheme for the motion estimation. In addition to the

standard trilinear scheme, a windowed-sinc interpola-

tion (using a Hanning window of size 7 ? 7 ? 7) was

tried. While considerably slower than trilinear inter-

polation, the sinc approach is able to further refine

motion estimates after the initial trilinear stage has

converged on a solution, thus providing greater accu-

racy. The results in Fig. 14 show the greater degree of

accuracy achieved over using trilinear interpolation

alone. Note that on the third data set (cropped toallow

distinction between the other four sets), the improve-

ment was consistently over a value of 2.0.

Choice of template image.

crease the accuracy of the scheme and as a final pa-

rameter investigation, a method using a mean image

template was implemented. This scheme generates a

mean image for the series by averaging all the volumes

over time after the first three stages of trilinear inter-

polation-based motion correction have been carried

out. In doing sowehopetoberegistering all volumes to

a more generalized target which exhibits less overall

variation from each volume in the series than the orig-

inal target (middle) volume previously used. This new

mean image is a robust target to which the original

time series is then registered, again using three trilin-

ear interpolation stages and an optional final sinc in-

terpolation stage.

Because we are registering to a mean image, we no

longer have gold standard values for the transforma-

tions found by the correction scheme. Therefore, to

quantify the accuracy of the correction, a median ab-

solute residual variation (MARV) score was created by

initially demeaning each voxel time-series and then

measuring the median value of the residual absolute

values in this time-series. That is,

In an attempt to in-

M ARV (x, y, z) ??

t?1

N

?It?x, y, z? ? Imean?x, y, z??/N .

T ABL E 2

Accuracy Counts for the Five Cost Functions Offered

by MCFLIRT

Cost

No. of sets

most

accurate

No. of sets

second most

accurate

Normalized correlation

Correlation ratio

Mutual information

Normalized mutual information

Least squares

8

2

2

0

3

5

7

1

2

0

F IG. 13.

smoothed minus smoothed) for the MCFLIRT scheme applied to five

synthetic data sets (A–E) exhibiting known motion at increasing

intensities. A positive value indicates improved accuracy as a result

of smoothing the cost function. Cost function notation corresponds to

Table 1 and the results demonstrate the improvement in accuracy

achieved by using the smoothed cost functions.

Median (over time) RMS (over space) error results (un-

F IG. 14.

the MCFLIRT scheme applied to synthetic data exhibiting known

motion of one of five designs and audiovisual activation at increasing

intensities. A positive value indicates improved accuracy as a result

of incorporating the final sinc interpolation stage. Cost function

notation corresponds to Table 1 and demonstrates the improvement

in accuracy achieved by using smoothed cost functions and addi-

tional sinc interpolation when compared to the basic trilinear

scheme reported in Fig. 13.

Median (over time) RMS (over space) error results for

837

ROBUST AND ACCURATE REGISTRATION

Page 14

This produces a volume of MARV scores for each

voxel and the median of these values (over the volume)

is then taken as a summary measure. This is effec-

tively a measure characterizing the level of intervol-

ume intensity variation (presumed tobe due tosubject

motion) after retrospective motion correction has been

applied. While this can only work for activation-free

data (so that in perfect alignment the variance should

beat minimum), it can giveus a clear impression of the

accuracy of the motion correction scheme. Because

SPM rejects information outside a mask obtained from

the data (end-slice effects), the corrected median im-

ages were masked according tothe corrected SPM data

so that the measure reflected a consistent comparison

across the schemes. The results shown in Fig. 15 cor-

respond to the MARV values generated after running

MCFLIRT and SPM on the null-activation data set for

both the low and the severe motion designs.

Results using the RMS measure (Table 3) revealed

that, although all three schemes provide subvoxel ac-

curacy, AIR 3.08 using least squares (which we found

to give better results than the standard AIR measure)

and windowed sinc interpolation was almost an order

of magnitude worse than basic three-stage trilinear

MCFLIRT. Accordingly, we decided not to compare it

further. However, we note that AIR was primarily de-

signed to solve several different registration problems

that arise in tomographic data sets (Woods, 1998)

rather than optimized for FMRI motion correction.

From these results we conclude that for some cases

(generally the low motion data), MCFLIRT with the

correlation ratio cost function produces significantly

smaller errors than SPM99, while in other cases (some

of the high motion data) both methods give similar

results. This can be seen be comparing the heights

(MARV values) of the SPM bars with the CR bars

(typically the best MCFLIRT cost function) where a 30

to 40% reduction can be seen in the first, second, and

fourth cases.

Slightly surprisingly, we found that the use of a

mean image template gave no discernable improve-

ment in accuracy. We conclude that for artificial data

where the motion is purely rigid, there is noadvantage

to using an (possibly blurred) average image over an

image from the original data. We would expect that the

mean template scheme could yield greater accuracy

where the data includes some physical motion-induced

artifacts and the choice of a reference image from the

original data set is not so obvious.

Null data study. Having established the accuracy

of MCFLIRT on artificial data, we ran both our scheme

and SPM99 on a number of real fMRI studies. In all

instances, the subjects had been exposed to no stimu-

lus (null data). The underlying assumption was that

after motion correction on a null data study, we would

expect theoverall variation of thedata tobelower than

before correction as subject motion-induced variablity

had been minimized. Again, results were masked ac-

cording to the SPM data to give a fair comparison.

Results, given in Fig. 16, show considerable changes

induced by SPM (both beneficial and detrimental) but

only minimal changes induced by MCFLIRT. This is

due to the fact that the amount of actual motion that

occurred in these studies is very low, so that the

F IG. 15.

corrected data processed by different motion correction schemes:

Uncorrected, MCFLIRT w. CR, MCFLIRT w. NC, MCFLIRT w. CR

and mean, MCFLIRT w. NC and mean, SPM99 run at full quality,

with sinc interpolation and interpolation error adjustment.

Median absolute residual variation (MARV) values for

T ABL E 3

RMS Deviation Values for Synthetic Null Data

UncorrectedAIR SPM MCFLIRT

Sum of squared

intensity errors

RMS error (mm)

936.5866

2.3360

406.8876

1.7570

1.6405

0.1064

1.5171

0.1102

F IG. 16.

comparing MCFLIRT w. CR, and SPM99 correction using full qual-

ity, sinc interpolation, and interpolation error adjustment. The re-

sults show the range of percentage changes in residual variance as a

result of the motion correction (i.e., comparing it to the uncorrected

variance).

Summary MARV statistics for three sets of real data,

838

J ENKINSON ET AL.

Page 15

changes in intensity in each voxel (as a result of motion

correction) are also low—in fact, lower than the ex-

pected changes induced by physiological processes.

Consequently, these test results are inconclusive since

the test measure does not purely measure motion-in-

duced changes but also physiological changes which

dominate in this case. The only way one could expect to

obtain quantitative analysis for real data would be to

incorporate some form of position measurement into

the scanner—a facility not available to us.

Real activation study.As we have described above,

it is difficult to make accurate measurements pertain-

ing tothe accuracy of a motion correction scheme when

presented with real data. With data exhibiting activa-

tion, examination of the time-series after correction

using an animation tool reveals no visible extreme

affine movement although some motion artifacts re-

main. In particular, we are able to show good localiza-

tion of activations which would not be possible without

motion correction first being carried out. The thresh-

olded statistical maps shown in Fig. 17 correspond toa

180 volume audiovisual experiment. Analysis was car-

ried out using FEAT, FMRIBs easy analysis tool using

an improved linear model (Woolrich et al., 2001). In

order to test the effectiveness of MCFLIRT on real

data, the subject was asked to move his or her head

during the experiment.

It can be seen, by comparing the activations from the

uncorrected and corrected data sets, that in the uncor-

rected set a largenumber of falsepositives exist in both

visual and auditory results. Whileit can beargued that

both data sets are highly corrupted by this large mo-

tion (indeed, even the corrected data set still exhibits

some visible movement, albeit at a significantly

smaller scale than the uncorrected data), the MCF-

LIRT-corrected visual stimulus is well localized and

allows an otherwise corrupted set of experimental data

to yield potentially useful results.

DISCUSSION

This paper has examined the problem of optimiza-

tion for fully automatic registration of brain images. In

particular, the problem of avoiding local minima is

addressed in two ways. First, a general apodization

(smoothing) method was formulated for cost functions

in order to eliminate small discontinuities formed by

discontinuous changes of the number of voxels in the

overlapping field of view with changing transformation

parameters. Second, a novel hybrid global–local opti-

F IG. 17.Results of a FEAT analysis on motion-corrected audiovisual data.

839

ROBUST AND ACCURATE REGISTRATION