Content uploaded by Charles L. Byrne
Author content
All content in this area was uploaded by Charles L. Byrne on Jan 09, 2014
Content may be subject to copyright.
Iterative Algorithms in Tomography
Charles Byrne (Charles Byrne@uml.edu),
Department of Mathematical Sciences,
University of Massachusetts Lowell, Lowell, MA 01854
October 17, 2005
Abstract
The fundamental mathematical problem in tomographic image reconstruc-
tion is the solution, often approximate, of large systems of linear equations,
which we denote here as Ax =b. The unknown entries of the vector xoften
represent intensity levels, of beam attenuation in transmission tomography, of
radionuclide concentration in emission tomography, and so are naturally non-
negative. The entries of the vector bare typically counts of detected photons,
the entries of the matrix Aare lengths or probabilities, and so these quantities
are also nonnegative. The size of these systems, typically thousands of equa-
tions and thousands of unknowns, preclude the use of Gauss elimination and
necessitate iterative methods of solution. We survey a variety of such methods
and present some open questions concerning their behavior.
The step sizes for iterative algorithms such as the Landweber method involve
parameters that depend on the largest eigenvalue λmax of the matrix A†A.
Because of the size of A, calculating A†A, let alone finding its largest eigenvalue,
is out of the question. Easily calculated upper bounds for λmax are available
that are particularly useful when Ais sparse, that is, most of the entries of
Aare zero, which is typically the case in tomography. These bounds become
tighter as the size of Aincreases.
1 Introduction
Image reconstruction from tomographic data is a fairly recent, and increasingly im-
portant, area of applied numerical linear algebra, particularly for medical diagnosis
[36, 39, 47, 58, 59, 67, 68] . Fundamentally, the problem is to solve, at least approx-
imately, a large system of linear equations, Ax =b. The vector xis large because
it is usually a vectorization of a discrete approximation of a function of two or three
continuous spatial variables. The size of the system necessitates the use of iterative
solution methods [52]. Because the entries of xusually represent intensity levels, of
beam attenuation in transmission tomography, and of radionuclide concentration in
1
emission tomography, we require xto be nonnegative; the physics of the situation
may impose additional constraints on the entries of x. In practice, we often have
prior knowledge about the function represented, in discrete form, by the vector x
and we may wish to include this knowledge in the reconstruction. In tomography
the entries of Aand bare also nonnegative. Iterative algorithms tailored to find
solutions to these special, constrained problems may out-perform general iterative
solution methods [57]. To be medically useful in the clinic, the algorithms need to
produce acceptable reconstructions early in the iterative process.
Exact solutions of Ax =bmay not exist, so we need appropriate measures of
distance between vectors to obtain suitable approximate solutions. The entries of the
vector bare data obtained by measurements, and so are noisy. Consequently, exact
solutions of Ax =b, even when available, may be too noisy to be useful. Bayesian or
penalized optimization algorithms are used to obtain reconstructions displaying the
desired smoothness [31, 35, 37, 38, 53, 54].
Certain iterative algorithms require that we select a parameter that governs the
size of the steps taken at each iteration. For the Landweber and projected Landwe-
ber methods [1], this parameter is dependent on the largest eigenvalue, λmax, of the
matrix A†A. Because the system is large, calculating A†A, let alone computing λmax,
is impractical. If we overestimate λmax, the step lengths become too small and the
algorithm is too slow to be practical; tight upper bounds for λmax that can be ob-
tained from Aitself help to accelerate these algorithms. Upper bounds exist that
are particularly useful for the common case in which Ais sparse, that is, most of its
entries are zero [13]. These upper bounds are shown to become tighter as the size of
the system increases [18].
Our purpose here is to discuss various algorithms that are employed in tomo-
graphic image reconstruction, and to present several open questions concerning these
algorithms.
2 Tomography
These days, the term tomography is used by lay people and practitioners alike to de-
scribe any sort of scan, from ultrasound to magnetic resonance. It has apparently lost
its association with the idea of slicing, as in the expression three-dimensional tomog-
raphy. In this paper we focus on two important modalities, transmission tomography
and emission tomography. An x-ray CAT scan is an example of the first, a positron-
emission (PET) scan is an example of the second. Although there is some flexibility
2
in the mathematical description of the image reconstruction problem posed by these
methods, we shall concentrate here on the algebraic formulation of the problem. In
this formulation, the problem is to solve, at least approximately, a large system of
linear equations, Ax =b. What the entries of the matrix Aand the vectors xand b
represent will vary from one modality to another; for our purposes, the main point is
simply that all of these entries are nonnegative.
In both modalities the vector xthat we seek is a vectorization, that is, a one-
dimensional encoding, of an unknown two- or three-dimensional discrete function. It
is this transition from higher dimensions to a single dimension that causes xto be
large. The quantity xj, the j-th entry of the vector x, represents the value of the
function at the pixel or voxel corresponding to the index j. The quantity bi, the
i-th entry of the vector b, is measured data, the discrete line integral of xalong the
i-th line segment, in the transmission case, and photon counts at the i-th detector
in the emission case. The entries of the matrix Adescribe the relationship that
holds between the various pixels and the various detectors, that is, they describe the
scanning process whereby the information about the unknown function is translated
into measured data. In the transmission case, the entries of Adescribe the geometric
relationship between the patient and the scanner, as well as the paths taken by the
beams. In the emission case, the entries of Aare the probabilities of a photon being
detected at the various detectors, given that it was emitted at a particular pixel. In
both cases, there is a certain amount of simplification and guesswork that goes into
the choice of these entries. In the emission case, the probabilities depend, in part, on
the attenuation encountered as the photons pass from within the body to the exterior,
and so will depend on the anatomy of the particular patient being scanned.
2.1 Transmission Tomography
When an x-ray beam travels along a line segment through the body it becomes pro-
gressively weakened by the material it encounters. By comparing the initial strength
of the beam as it enters the body with its final strength as it exits the body, we can
estimate the integral of the attenuation function, along that line segment. The data
in transmission tomography are these line integrals, corresponding to thousands of
lines along which the beams have been sent. The image reconstruction problem is to
create a discrete approximation of the attenuation function. The inherently three-
dimensional problem is usually solved one two-dimensional plane, or slice, at a time,
hence the name tomography [39].
The beam attenuation at a given point in the body will depend on the material
3
present at that point; estimating and imaging the attenuation as a function of spatial
location will give us a picture of the material within the body. A bone fracture will
show up as a place where significant attenuation should be present, but is not.
The attenuation function is discretized, in the two-dimensional case, by imagining
the body to consist of finitely many squares, or pixels, within which the function
has a constant, but unknown, value. This value at the j-th pixel is denoted xj. In
the three-dimensional formulation, the body is viewed as consisting of finitely many
cubes, or voxels. The beam is sent through the body along various lines and both
initial and final beam strength is measured. From that data we can calculate a discrete
line integral along each line. For i= 1, ..., I we denote by Lithe i-th line segment
through the body and by biits associated line integral. Denote by Aij the length of
the intersection of the j-th pixel with Li; therefore, Aij is nonnegative. Most of the
pixels do not intersect line Li, so Ais quite sparse. Then the data value bican be
described, at least approximately, as
bi=
J
X
j=1
Aij xj.(2.1)
Both I, the number of lines, and J, the number of pixels or voxels, are quite large,
although they certainly need not be equal, and are typically unrelated.
The matrix Ais large and rectangular. The system Ax =bmay or may not have
exact solutions. We are always free to select J, the number of pixels, as large as we
wish, limited only by computation costs. We may also have some choice as to the
number Iof lines, but within the constraints posed by the scanning machine and
the desired duration and dosage of the scan. When the system is underdetermined
(J > I), there may be infinitely many exact solutions; in such cases we usually
impose constraints and prior knowledge to select an appropriate solution. As we
mentioned earlier, noise in the data, as well as error in our model of the physics of
the scanning procedure, may make an exact solution undesirable, anyway. When the
system is overdetermined (J < I), we may seek a least-squares approximate solution,
or some other approximate solution. We may have prior knowledge about the physics
of the materials present in the body that can provide us with upper bounds for xj,
as well as information about body shape and structure that may tell where xj= 0.
Incorporating such information in the reconstruction algorithms can often lead to
improved images [57].
4
2.2 Emission Tomography
In single-photon emission tomography (SPECT) and positron emission tomography
(PET) the patient is injected with, or inhales, a chemical to which a radioactive
substance has been attached [68]. The chemical is designed to become concentrated
in the particular region of the body under study. Once there, the radioactivity results
in photons that travel through the body and, at least some of the time, are detected
by the scanner. The function of interest is the actual concentration of the radioactive
material at each spatial location within the region of interest. Learning what the
concentrations are will tell us about the functioning of the body at the various spatial
locations. Tumors may take up the chemical (and its radioactive passenger) more
avidly than normal tissue, or less avidly, perhaps. Malfunctioning portions of the
brain may not receive the normal amount of the chemical and will, therefore, exhibit
an abnormal amount of radioactivity.
As in the transmission tomography case, this nonnegative function is discretized
and represented as the vector x. The quantity bi, the i-th entry of the vector b, is
the photon count at the i-th detector; in coincidence-detection PET a detection is
actually a nearly simultaneous detection of a photon at two different detectors. The
entry Aij of the matrix Ais the probability that a photon emitted at the j-th pixel
or voxel will be detected at the i-th detector.
In the emission tomography case it is common to take a statistical view [51, 50,
62, 64, 67], in which the quantity xjis the expected number of emissions at the j-th
pixel during the scanning time, so that the expected count at the i-th detector is
E(bi) =
J
X
j=1
Aij xj.(2.2)
The system of equations Ax =bis obtained by replacing the expected count, E(bi),
with the actual count, bi; obviously, an exact solution of the system is not needed
in this case. As in the transmission case, we seek an approximate, and nonnegative,
solution of Ax =b, where, once again, all the entries of the system are nonnegative.
3 Iterative Reconstruction
We turn now to several iterative algorithms for solving the system Ax =b. Some
of these algorithms apply only to nonnegative systems, in which the entries of the
matrix and the vectors are nonnegative, while others apply even to complex-valued
systems. We shall use complex notation whenever permitted.
5
When the (possibly complex) Iby Jmatrix Ais large finding exact or approximate
solutions of the system of linear equations Ax =bis usually accomplished using iter-
ative algorithms. When the system is overdetermined we can obtain a least-squares
approximate solution, which is any vector x=xLS that minimizes the Euclidean
distance squared between Ax and b; that is, xLS minimizes
||Ax −b||2=
I
X
i=1 |(Ax)i−bi|2,(3.1)
where
(Ax)i=
J
X
j=1
Aij xj,(3.2)
for each i.
3.1 The Landweber Algorithm
The Landweber algorithm [49, 1], with the iterative step
xk+1 =xk+γA†(b−Axk),(3.3)
converges to the least squares solution closest to the starting vector x0, provided that
0< γ < 2/λmax , where λmax is the largest eigenvalue of the nonnegative-definite
matrix A†A. Loosely speaking, the larger γis, the faster the convergence. However,
precisely because Ais large, calculating the matrix A†A, not to mention finding its
largest eigenvalue, can be prohibitively expensive. The matrix Ais said to be sparse
if most of its entries are zero. In [13] upper bounds for λmax were obtained in terms
of the degree of sparseness of the matrix A. Later in this paper we investigate the
tightness of these bounds.
3.2 The Projected Landweber Algorithm
When we require a nonnegative approximate solution xfor the real system Ax =
bwe can use a modified version of the Landweber algorithm, called the projected
Landweber algorithm [1], having the iterative step
xk+1 = (xk+γAT(b−Axk))+,(3.4)
where, for any real vector a, we denote by (a)+the nonnegative vector whose en-
tries are those of a, for those that are nonnegative, and are zero otherwise. The
6
projected Landweber algorithm converges to a vector that minimizes ||Ax −b|| over
all nonnegative vectors x, for the same values of γ.
Both the Landweber and projected Landweber algorithms are special cases of the
CQ algorithm [13], which, in turn, is a special case of a much more general iterative
fixed point algorithm, the Krasnoselskii/Mann (KM) method; a proof of the KM
method is given in [14].
3.3 The Algebraic Reconstruction Technique
The algebraic reconstruction technique (ART) [36] applies to any system Ax =bof
linear equations. For each index value ilet Bibe the subset of J-dimensional vectors
given by
Bi={x|(Ax)i=bi}.(3.5)
Given any vector zthe vector in Biclosest to z, in the sense of the Euclidean distance,
has the entries
xj=zj+Aij (bi−(Az)i)/
J
X
m=1 |Aim|2.(3.6)
The ART is the following: begin with an arbitrary vector x0; for each nonnegative
integer k, having found xk, let i=k(mod I) + 1 and let xk+1 be the vector in Bi
closest to xk. We can use Equation (3.6) to write
xk+1
j=xk
j+Aij (bi−(Axk)i)/
J
X
m=1 |Aim|2.(3.7)
When the system Ax =bhas exact solutions the ART converges to the solution
closest to x0. How fast the algorithm converges will depend on the ordering of the
equations and on whether or not we use relaxation. Relaxed ART has the iterative
step
xk+1
j=xk
j+γAij (bi−(Axk)i)/
J
X
m=1 |Aim|2,(3.8)
where γ∈(0,2). In selecting the equation ordering, the important thing is to avoid
particularly bad orderings, in which the hyperplanes Biand Bi+1 are nearly parallel.
When there are no exact solutions, the ART does not converge to a single vector;
for each fixed ithe subsequence {xnI+i, n = 0,1, ...}converges to a vector ziand the
collection {zi|i= 1, ..., I}is called the limit cycle [66, 32, 16]. The limit cycle will
7
vary with the ordering of the equations, and contains more than one vector unless an
exact solution exists. There are several open questions about the limit cycle.
Open Question 1: For a fixed ordering, does the limit cycle depend on the initial
vector x0? If so, how?
Open Question 2: If there is a unique least-squares solution, where is it, in relation
to the vectors of the limit cycle? Can it be calculated easily, from the vectors of the
limit cycle?
There is a partial answer to the second question. In [7] (see also [16]) it was shown
that if the system Ax =bhas no exact solution, and if I=J+ 1, then the vectors of
the limit cycle lie on a sphere in J-dimensional space having the least-squares solution
at its center. This is not generally true, however.
Open Question 3: In both the consistent and inconsistent cases, the sequence {xk}
of ART iterates is bounded [66, 32, 7, 16]. The proof is easy in the consistent case.
Is there an easy proof for the inconsistent case?
Dax [29] has demonstrated interesting connections between the ART, applied to
Ax =b, and the Gauss-Seidel method, applied to the system AA†z=b.
3.4 Nonnegatively Constrained ART
If we are seeking a nonnegative solution for the real system Ax =b, we can modify
the ART by replacing the xk+1 given by Equation (3.7) with (xk+1)+. This version of
ART will converge to a nonnegative solution, whenever one exists, but will produce
a limit cycle otherwise.
3.5 The Multiplicative ART (MART)
Closely related to the ART is the multiplicative ART (MART) [36]. The MART,
which can be applied only to nonnegative systems, also uses one equation only at
each step of the iteration. The MART begins with a positive vector x0. Having found
xkfor nonnegative integer k, we let i=k(mod I) + 1 and define xk+1 by
xk+1
j=xk
j(bi
(Axk)i
)m−1
iAij ,(3.9)
where mi= max {Aij |j= 1,2, ..., J}. When Ax =bhas nonnegative solutions,
MART converges to such a solution. As with ART, the speed of convergence is greatly
affected by the ordering of the equations, converging most slowly when consecutive
equations correspond to nearly parallel hyperplanes.
8
Open Question 4: When there are no nonnegative solutions, MART does not
converge to a single vector, but, like ART, is always observed to produce a limit cycle
of vectors. Unlike ART, there is no proof of the existence of a limit cycle for MART.
3.6 The Simultaneous MART (SMART)
There is a simultaneous version of MART, called the SMART [21, 28, 63]. As with
MART, the SMART begins with a positive vector x0. Having calculated xk, we
calculate xk+1 using
log xk+1
j= log xk
j+s−1
j
I
X
i=1
Aij log bi
(Axk)i
,(3.10)
where sj=PI
i=1 Aij >0.
When Ax =bhas no nonnegative solutions, the SMART converges to an approx-
imate solution in the sense of cross-entropy, or Kullback-Leibler distance [3, 16]. For
positive numbers uand v, the Kullback-Leibler distance [48] from uto vis
KL(u, v) = ulog u
v+v−u. (3.11)
We also define KL(0,0) = 0, KL(0, v) = vand KL(u, 0) = +∞. The KL distance
is extended to nonnegative vectors component-wise, so that for nonnegative vectors
xand zwe have
KL(x, z) =
J
X
j=1
KL(xj, zj).(3.12)
Clearly, KL(x, z)≥0 and KL(x, z) = 0 if and only if x=z.
When there are nonnegative solutions of Ax =b, both MART and SMART con-
verge to the nonnegative solution minimizing the Kullback-Leibler distance KL(x, x0);
if x0is the vector whose entries are all one, then the solution minimizes the Shannon
entropy, SE(x), given by
SE(x) =
J
X
j=1
xjlog xj−xj.(3.13)
One advantage that SMART has over MART is that, if the nonnegative system Ax =b
has no nonnegative solutions, the SMART converges to the nonnegative minimizer
of the function KL(Ax, b) for which KL(x, x0) is minimized. One disadvantage of
SMART, compared to MART, is that it is slow.
9
3.7 Expectation Maximization Maximum Likelihood (EMML)
For nonnegative systems Ax =bin which the column sums of Aand the entries of
bare positive, the expectation maximization maximum likelihood (EMML) method
produces a nonnegative solution of Ax =b, whenever one exists [3, 4, 15, 25, 55,
64, 50, 67, 51] . If not, the EMML converges to a nonnegative approximate solution
that minimizes the function KL(b, Ax) [3, 5, 15, 25, 67]. The EMML begins with a
positive vector x0. The iterative step of the EMML method is
xk+1
j=s−1
jxk
j
I
X
i=1
Aij
bi
(Axk)i
,(3.14)
for sj=PI
i=1 Aij >0.
The EMML algorithm can also be viewed as a method for maximizing the likeli-
hood function, when the data biare instances of independent Poisson random variables
with mean value (Ax)i; here the entries of xare the parameters to be estimated.
An open question about the EMML algorithm is the following:
Open Question 5: How does the EMML limit depend on the starting vector x0?
In particular, when there are nonnegative exact solutions of Ax =b, which one does
the EMML produce and how does it depend on x0?
3.8 The Rescaled Block-Iterative EMML (RBI-EMML)
One drawback to the use of the EMML in practice is that it is slow; this is typical
behavior for simultaneous algorithms, which use all the equations at each step of
the iteration. The ordered-subset version of the EMML (OSEM) [44] often produces
images of similar quality in a fraction of the time. The OSEM is a block-iterative
method, in the sense that only some of the equations are used at each step of the
iteration. Unfortunately, the OSEM usually fails to converge, even when there are
exact nonnegative solutions of the system Ax =b. The rescaled block-iterative EMML
(RBI-EMML) is a corrected version of OSEM that does converge whenever there are
nonnegative solutions [6, 8, 15].
We begin by selecting subsets Sn, n = 1, ..., N whose union is the set of equation
indices {i= 1, ..., I}; the Snneed not be disjoint. Having found iterate xk, set
n=k(mod N) + 1; the OSEM iterative step is then
xk+1
j=s−1
nj xk
jX
i∈Sn
Aij
bi
(Axk)i
,(3.15)
for snj =Pi∈SnAij >0. Notice that the OSEM iterative step mimics that of EMML,
except that each summation is over only iin the current subset, Sn. It has been
10
shown that the OSEM converges to a nonnegative solution of Ax =b, when such
exact solutions exist, provided that the sums snj are independent of n, for each j;
this is the so-called subset-balanced condition and is quite restrictive. Without this
condition, the OSEM can produce a limit cycle, even when there are nonnegative
exact solutions of Ax =b, and when there are no such solutions, the vectors of its
limit cycle are typically farther apart than the level of noise in the data would seem
to indicate. The problem with OSEM is that there should be a second term on the
right side of Equation (3.15).
The RBI-EMML algorithm has the following iterative step:
xk+1
j=xk
j(1 −m−1
ns−1
jsnj ) + xk
jm−1
ns−1
jX
i∈Sn
Aij
bi
(Axk)i
,(3.16)
where
mn= max {snj /sj|j= 1, ..., J}.(3.17)
For any choice of subsets Sn, and any starting vector x0>0, the RBI-EMML con-
verges to a nonnegative solution whenever one exists. If subset-balance holds, then
the RBI-EMML reduces to the OSEM method. The acceleration, compared to the
EMML, is roughly on the order of N, the number of subsets. As with the ART, the
composition of the subsets, as well as their ordering, can affect the rate of convergence.
As with the EMML, there are several open questions.
Open Question 6: When there are nonnegative solutions of Ax =b, how does the
solution given by the RBI-EMML depend on the starting vector x0and on the choice
and ordering of the subsets?
Open Question 7: When there are no nonnegative solutions of Ax =bdoes the
RBI-EMML produce a limit cycle? This is always observed in actual calculations,
but no proof is known.
Open Question 8: When there are no nonnegative solutions of Ax =bhow do the
vectors of the RBI-EMML limit cycle relate to the approximate solution given by
EMML?
3.9 The Rescaled Block-Iterative SMART (RBI-SMART)
The SMART algorithm also has a rescaled block-iterative version, the RBI-SMART
[21, 6, 8, 15]. The iterative step of the RBI-SMART is
xk+1
j=xk
jexp m−1
ns−1
jX
i∈Sn
Aij
bi
(Axk)i.(3.18)
11
When Ax =bhas nonnegative solutions, the RBI-SMART converges to the same
limit as MART and SMART, for all choices of subsets Sn.
Open Question 9: When Ax =bhas no nonnegative solutions, the RBI-SMART is
always observed to produce a limit cycle, but no proof of this is known.
4 Feedback in Block-Iterative Reconstruction
When the nonnegative system Ax =bhas no nonnegative exact solutions, block-
iterative methods such as MART, RBI-SMART, and RBI-EMML have always been
observed to exhibit subsequential convergence to a limit cycle, although no proof of
this is known. These algorithms approach their limit cycles much sooner than their
simultaneous versions, SMART and EMML, approach their limits.
Open Question 10: Can we use the vectors of the limit cycle for MART or RBI-
SMART (RBI-EMML) to calculate easily the limit of SMART (EMML)?
In this section we present a partial answer to this question, using a feedback
method. More detail concerning the feedback method is in [17]. We assume through-
out this section that the limit cycles always exist.
We assume that, for each fixed n= 1,2, ..., N, the subsequence {xmN +n, m =
0,1, ...}converges to a vector znand the collection {zn|n= 1, ..., N}is called the
limit cycle; for convenience, we also define z0=zN. The main property of the limit
cycle is the following: if we restart the algorithm at z0, the next iterate is z1, followed
by z2, ..., zNagain. The limit cycle will vary with the algorithm, with N, with the
choice of subsets Sn, with ordering of the equations, and will contain more than one
vector unless an exact nonnegative solution exists.
For each nand for each iin the subset Sn, let ci= (Azn−1)i, The vector cwith
entries ciwill now be viewed as new data, replacing the vector b, and the algorithm
restarted at the original x0. This is the feedback step. Once again, a limit cycle will
be produced, another vector of new data will be generated, feedback will take place
again, and the process will continue. What are we obtaining by this succession of
feedback steps?
This feedback approach was considered originally in [7], where it was also applied
to the ART. For the ART case it was shown there that the systems Ax =band
Ax =chave the same least-squares solutions, which suggests the possibility that the
limit cycles generated by feedback might converge to the least-squares solution of the
original system, Ax =b. Results along these lines were presented in [7]. The success
with ART prompted us to ask the same questions about feedback applied to other
12
block-iterative algorithms; some partial results were obtained [7].
Open Question 11: When feedback is applied to the RBI-SMART algorithm do
the limit cycles obtained converge to a nonnegative minimizer of the function
N
X
n=1
m−1
nX
i∈Sn
KL((Ax)i, bi)?
If J > I, how should the feedback step deal with the zero entries in the vectors zn?
Open Question 12: When feedback is applied to the RBI-EMML algorithm do the
limit cycles obtained converge to a nonnegative minimizer of the function
N
X
n=1
m−1
nX
i∈Sn
KL(bi,(Ax)i)?
If J > I, how should the feedback step deal with the zero entries in the vectors zn?
5 Iterative Regularization in ART
It is often the case that the entries of the vector bin the system Ax =bcome from
measurements, so are usually noisy. If the entries of bare noisy but the system
Ax =bremains consistent (which can easily happen in the underdetermined case,
with J > I), the ART begun at x0= 0 converges to the solution having minimum
norm, but this norm can be quite large. The resulting solution is probably useless.
Instead of solving Ax =b, we regularize by minimizing, for example, the function
||Ax −b||2+2||x||2,(5.1)
for some small > 0. The solution to this problem is the vector xfor which
(A†A+2I)x=A†b. (5.2)
However, we do not want to have to calculate A†A, particularly when the matrix A
is large.
We discuss two methods for using ART to obtain regularized solutions of Ax =b.
The first one is presented in [16], while the second one is due to Eggermont, Herman,
and Lent [33]. For notational convenience, we consider only real systems.
In our first method we use ART to solve the system of equations given in matrix
form by
[ATI ]u
v= 0.
13
We begin with u0=band v0= 0. The lower component of the limit vector is then
v∞=−ˆx, where ˆxminimizes the function in line (5.1).
The method of Eggermont et al. is similar. In their method we use ART to solve
the system of equations given in matrix form by
[A I ]x
v=b.
We begin at x0= 0 and v0= 0. Then, the limit vector has for its upper component
x∞= ˆxas before. Also, v∞=b−Aˆx.
Complicating our analysis for the case in which Ax =bhas no nonnegative solu-
tions is the behavior of approximate solutions when nonnegativity is imposed, which
is the subject of the next section.
6 Approximate Solutions and the Nonnegativity
Constraint
For the real system Ax =b, consider the nonnegatively constrained least-squares
problem of minimizing the function ||Ax −b||, subject to the constraints xj≥0
for all j; this is a nonnegatively constrained least-squares approximate solution. As
noted previously, we can solve this problem using a slight modification of the ART.
Although there may be multiple solutions ˆx, we know, at least, that Aˆxis the same
for all solutions.
According to the Karush-Kuhn-Tucker theorem [60], the vector Aˆxmust satisfy
the condition
I
X
i=1
Aij ((Aˆx)i−bi) = 0 (6.1)
for all jfor which ˆxj>0 for some solution ˆx. Let Sbe the set of all indices jfor
which there exists a solution ˆxwith ˆxj>0. Then Equation (6.1) must hold for all j
in S. Let Qbe the matrix obtained from Aby deleting those columns whose index
jis not in S. Then QT(Aˆx−b) = 0. If Qhas full rank and the cardinality of Sis
greater than or equal to I, then QTis one-to-one and Aˆx=b. We have proven the
following result.
Theorem 6.1 Suppose that Aand every matrix Qobtained from Aby deleting
columns has full rank. Suppose there is no nonnegative solution of the system of
equations Ax =b. Then there is a subset Sof the set {i= 1,2, ..., I}with cardinality
at most I−1such that, if ˆxis any minimizer of ||Ax −b|| subject to x≥0, then
ˆxj= 0 for jnot in S. Therefore, ˆxis unique.
14
When ˆxis a vectorized two-dimensional image and J > I, the presence of at most
I−1 positive pixels makes the resulting image resemble stars in the sky; for that reason
this theorem and the related result for the EMML algorithm ([3]) are sometimes called
night sky theorems. The zero-valued pixels typically appear scattered throughout the
image. This behavior occurs with all the algorithms discussed so far that impose
nonnegativity, whenever the real system Ax =bhas no nonnegative solutions.
This result leads to the following open question:
Open Question 13: How does the set Sdefined above vary with the choice of
algorithm, with the choice of x0for a given algorithm, and for the choice of subsets
in the block-iterative algorithms?
We return now to an issue that arose in the discussion of the Landweber and
projected Landweber algorithms, namely, obtaining a good upper bound for λmax ,
the maximum eigenvalue of A†A.
7 An Upper Bound for the Maximum Eigenvalue
of A†A
The upper bounds for λmax we present here apply to any matrix A, but will be
particularly helpful when Ais sparse.
7.1 The Normalized Case
We assume now that the matrix Ahas been normalized so that each of its rows has
Euclidean length one. Denote by sjthe number of nonzero entries in the jth column
of A, and let sbe the maximum of the sj. Our first result is the following [13]:
Theorem 7.1 For normalized A,λmax, the largest eigenvalue of the matrix A†A,
does not exceed s.
Proof: For notational simplicity, we consider only the case of real matrices and
vectors. Let ATAv =cv for some nonzero vector v. We show that c≤s. We have
AATAv =cAv and so wTAATw=vTATAATAv =cvTATAv =cwTw, for w=Av.
Then, with eij = 1 if Aij 6= 0 and eij = 0 otherwise, we have
(
I
X
i=1
Aij wi)2= (
I
X
i=1
Aij eij wi)2
≤(
I
X
i=1
A2
ij w2
i)(
I
X
i=1
e2
ij ) =
15
(
I
X
i=1
A2
ij w2
i)sj≤(
I
X
i=1
A2
ij w2
i)s.
Therefore,
wTAATw=
J
X
j=1
(
I
X
i=1
Aij wi)2≤
J
X
j=1
(
I
X
i=1
A2
ij w2
i)s,
and
wTAATw=c
I
X
i=1
w2
i=c
I
X
i=1
w2
i(
J
X
j=1
A2
ij )
=c
I
X
i=1
J
X
j=1
w2
iA2
ij .
The result follows immediately.
When Ais normalized the trace of AAT, that is, the sum of its diagonal entries,
is M. Since the trace is also the sum of the eigenvalues of both AATand ATA, we
have λmax ≤M. When Ais sparse, sis much smaller than M, so provides a much
tighter upper bound for λmax.
7.2 The General Case
A similar upper bound for λmax is given for the case in which Ais not normalized.
Theorem 7.2 For each i= 1, ..., I let νi=PJ
j=1 |Aij |2>0. For each j= 1, ..., J ,
let σj=PI
i=1 eij νi, where eij = 1 if Aij 6= 0 and eij = 0 otherwise. Let σdenote the
maximum of the σj. Then the eigenvalues of the matrix A†Ado not exceed σ.
The proof of Theorem 7.2 is similar to that of Theorem 7.1; the details are in [13].
7.3 Upper Bounds for -Sparse Matrices
If Ais not sparse, but most of its entries have magnitude not exceeding > 0 we say
that Ais -sparse. We can extend the results for the sparse case to the -sparse case.
Given a matrix A, define the entries of the matrix Bto be Bij =Aij if |Aij|> ,
and Bij = 0, otherwise. Let C=A−B; then |Cij | ≤ , for all iand j. If Ais
-sparse, then Bis sparse. The 2-norm of the matrix A, written ||A||, is defined to be
the square root of the largest eigenvalue of the matrix A†A, that is, ||A|| =√λmax.
From Theorem 7.2 we know that ||B|| ≤ σ. The trace of the matrix C†Cdoes not
exceed IJ2. Therefore
qλmax =||A|| =||B+C|| ≤ ||B|| +||C|| ≤ √σ+√IJ, (7.1)
16
so that
λmax ≤σ+ 2√σIJ +IJ 2.(7.2)
Simulation studies have shown that these upper bounds become tighter as the size
of the matrix Aincreases. In hundreds of runs, with Iand Jin the hundreds, we
found that the relative error of the upper bound was around one percent [18].
8 From General Systems to Nonnegative Systems
The EMML and SMART algorithms require that the matrix involved have nonneg-
ative entries. Here, we show how to convert general linear systems to equivalent
systems having this desired form.
Suppose that Hc =dis an arbitrary (real) system of linear equations, with the
matrix H= [Hij ]. Rescaling the equations if necessary, we may assume that for each
jthe column sum PiHij is nonzero; note that if a particular rescaling of one equation
to make the first column sum nonzero causes another column sum to become zero, we
simply choose a different rescaling. Since there are finitely many columns to worry
about, we can always succeed in making all the column sums nonzero. Now redefine
Hand cas follows: replace Hkj with Gkj =Hk j
PiHij and cjwith gj=cjPiHij ; the
product Hc is equal to Gg and the new matrix Ghas column sums equal to one. The
system Gg =dstill holds, but now we know that Pidi=d+=Pjgj=g+. Let Ube
the matrix whose entries are all one, and let t≥0 be large enough so that B=G+tU
has all nonnegative entries. Then Bg =Gg + (tg+)1, where 1 is the vector whose
entries are all one. So, the new system of equations to solve is Bg =d+ (td+)1 = y.
References
[1] Bertero, M., and Boccacci, P. (1998) Introduction to Inverse Problems in Imaging
Bristol, UK: Institute of Physics Publishing.
[2] Browne, J. and A. DePierro, A. (1996) “A row-action alternative to the EM
algorithm for maximizing likelihoods in emission tomography.”IEEE Trans. Med.
Imag. 15, pp. 687–699.
[3] Byrne, C. (1993) “Iterative image reconstruction algorithms based on cross-
entropy minimization.”IEEE Transactions on Image Processing IP-2, pp. 96–
103.
17
[4] Byrne, C. (1995) “Erratum and addendum to ‘Iterative image reconstruction
algorithms based on cross-entropy minimization’.”IEEE Transactions on Image
Processing IP-4, pp. 225–226.
[5] Byrne, C. (1996) “Iterative reconstruction algorithms based on cross-entropy
minimization.”in Image Models (and their Speech Model Cousins), S.E. Levin-
son and L. Shepp, editors, IMA Volumes in Mathematics and its Applications,
Volume 80, pp. 1–11. New York: Springer-Verlag.
[6] Byrne, C. (1996) “Block-iterative methods for image reconstruction from projec-
tions.”IEEE Transactions on Image Processing IP-5, pp. 792–794.
[7] Byrne, C. (1997) “Convergent block-iterative algorithms for image reconstruction
from inconsistent data.”IEEE Transactions on Image Processing IP-6, pp. 1296–
1304.
[8] Byrne, C. (1998) “Accelerating the EMML algorithm and related iterative algo-
rithms by rescaled block-iterative (RBI) methods.”IEEE Transactions on Image
Processing IP-7, pp. 100–109.
[9] Byrne, C. (1999) “Iterative projection onto convex sets using multiple Bregman
distances.”Inverse Problems 15, pp. 1295–1313.
[10] Byrne, C. (2000) “Block-iterative interior point optimization methods for image
reconstruction from limited data.”Inverse Problems 16, pp. 1405–1419.
[11] Byrne, C. (2001) “Bregman-Legendre multidistance projection algorithms for
convex feasibility and optimization.”in Inherently Parallel Algorithms in Feasi-
bility and Optimization and their Applications, Butnariu, D., Censor, Y., and
Reich, S., editors, pp. 87–100. Amsterdam: Elsevier Publ.,
[12] Byrne, C. (2001) “Likelihood maximization for list-mode emission tomographic
image reconstruction.”IEEE Transactions on Medical Imaging 20(10), pp. 1084–
1092.
[13] Byrne, C. (2002) “Iterative oblique projection onto convex sets and the split
feasibility problem.”Inverse Problems 18, pp. 441–453.
[14] Byrne, C. (2004) “A unified treatment of some iterative algorithms in signal
processing and image reconstruction.”Inverse Problems 20, pp. 103–120.
18
[15] Byrne, C. (2005) Choosing parameters in block-iterative or ordered-subset re-
construction algorithms, IEEE Transactions on Image Processing,14 (3), pp.
321–327.
[16] Byrne, C. (2005) Signal Processing: A Mathematical Approach, AK Peters, Publ.,
Wellesley, MA.
[17] Byrne, C. (2005) “Feedback in Iterative Algorithms” unpublished lecture notes.
[18] Byrne, C., and Ward, S. (2005) “Estimating the Largest Singular Value of a
Sparse Matrix” in preparation.
[19] Censor, Y. (1981) “Row-action methods for huge and sparse systems and their
applications.”SIAM Review,23: 444–464.
[20] Censor, Y., Eggermont, P.P.B., and Gordon, D. (1983) “Strong underrelaxation
in Kaczmarz’s method for inconsistent systems.”Numerische Mathematik 41, pp.
83–92.
[21] Censor, Y. and Segman, J. (1987) “On block-iterative maximization.”J. of In-
formation and Optimization Sciences 8, pp. 275–291.
[22] Censor, Y. and Zenios, S.A. (1997) Parallel Optimization: Theory, Algorithms
and Applications. New York: Oxford University Press.
[23] Chang, J.-H., Anderson, J.M.M., and Votaw, J.R. (2004) “Regularized image
reconstruction algorithms for positron emission tomography.”IEEE Transactions
on Medical Imaging 23(9), pp. 1165–1175.
[24] Cimmino, G. (1938) “Calcolo approssimato per soluzioni die sistemi di equazioni
lineari.”La Ricerca Scientifica XVI, Series II, Anno IX 1, pp. 326–333.
[25] Csisz´ar, I. and Tusn´ady, G. (1984) “Information geometry and alternating min-
imization procedures.”Statistics and Decisions Supp. 1, pp. 205–237.
[26] Csisz´ar, I. (1989) “A geometric interpretation of Darroch and Ratcliff’s general-
ized iterative scaling.”The Annals of Statistics 17 (3), pp. 1409–1413.
[27] Csisz´ar, I. (1991) “Why least squares and maximum entropy? An axiomatic
approach to inference for linear inverse problems.”The Annals of Statistics 19
(4), pp. 2032–2066.
19
[28] Darroch, J. and Ratcliff, D. (1972) “Generalized iterative scaling for log-linear
models.”Annals of Mathematical Statistics 43, pp. 1470–1480.
[29] Dax, A. (1990) “The convergence of linear stationary iterative processes for solv-
ing singular unstructured systems of linear equations,” SIAM Review,32, pp.
611–635.
[30] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) “Maximum likelihood from
incomplete data via the EM algorithm.”Journal of the Royal Statistical Society,
Series B 37, pp. 1–38.
[31] De Pierro, A. (1995) “A modified expectation maximization algorithm for penal-
ized likelihood estimation in emission tomography.”IEEE Transactions on Med-
ical Imaging 14, pp. 132–137.
[32] De Pierro, A. and Iusem, A. (1990) “On the asymptotic behavior of some al-
ternate smoothing series expansion iterative methods.”Linear Algebra and its
Applications 130, pp. 3–24.
[33] Eggermont, P.P.B., Herman, G.T., and Lent, A. (1981) “Iterative algorithms
for large partitioned linear systems, with applications to image reconstruc-
tion.”Linear Algebra and its Applications 40, pp. 37–67.
[34] Fessler, J., Ficaro, E., Clinthorne, N., and Lange, K. (1997) Grouped-coordinate
ascent algorithms for penalized-likelihood transmission image reconstruction,
IEEE Transactions on Medical Imaging,16 (2), pp. 166–175.
[35] Geman, S., and Geman, D. (1984) “Stochastic relaxation, Gibbs distributions
and the Bayesian restoration of images.”IEEE Transactions on Pattern Analysis
and Machine Intelligence PAMI-6, pp. 721–741.
[36] Gordon, R., Bender, R., and Herman, G.T. (1970) “Algebraic reconstruction
techniques (ART) for three-dimensional electron microscopy and x-ray photog-
raphy.”J. Theoret. Biol. 29, pp. 471–481.
[37] Green, P. (1990) “Bayesian reconstructions from emission tomography data using
a modified EM algorithm.”IEEE Transactions on Medical Imaging 9, pp. 84–93.
[38] Hebert, T. and Leahy, R. (1989) “A generalized EM algorithm for 3-D Bayesian
reconstruction from Poisson data using Gibbs priors.”IEEE Transactions on
Medical Imaging 8, pp. 194–202.
20
[39] Herman, G.T. (ed.) (1979) “Image Reconstruction from Projections” , Topics in
Applied Physics, Vol. 32, Springer-Verlag, Berlin.
[40] Herman, G.T., and Natterer, F. (eds.) “Mathematical Aspects of Computerized
Tomography” , Lecture Notes in Medical Informatics, Vol. 8, Springer-Verlag,
Berlin.
[41] Herman, G.T., Censor, Y., Gordon, D., and Lewitt, R. (1985) Comment (on the
paper [67]), Journal of the American Statistical Association 80, pp. 22–25.
[42] Herman, G. T. and Meyer, L. (1993) “Algebraic reconstruction techniques can
be made computationally efficient.”IEEE Transactions on Medical Imaging 12,
pp. 600–609.
[43] Holte, S., Schmidlin, P., Linden, A., Rosenqvist, G. and Eriksson, L. (1990)
“Iterative image reconstruction for positron emission tomography: a study of
convergence and quantitation problems.”IEEE Transactions on Nuclear Science
37, pp. 629–635.
[44] Hudson, H.M. and Larkin, R.S. (1994) “Accelerated image reconstruction using
ordered subsets of projection data.”IEEE Transactions on Medical Imaging 13,
pp. 601–609.
[45] Hutton, B., Kyme, A., Lau, Y., Skerrett, D., and Fulton, R. (2002) “A hybrid 3-
D reconstruction/registration algorithm for correction of head motion in emission
tomography.”IEEE Transactions on Nuclear Science 49 (1), pp. 188–194.
[46] Kaczmarz, S. (1937) “Angen¨aherte Aufl¨osung von Systemen linearer Gleichun-
gen.”Bulletin de l’Academie Polonaise des Sciences et Lettres A35, pp. 355–357.
[47] Kak, A., and Slaney, M. (2001) “Principles of Computerized Tomographic Imag-
ing” , SIAM, Philadelphia, PA.
[48] Kullback, S. and Leibler, R. (1951) “On information and sufficiency.”Annals of
Mathematical Statistics 22, pp. 79–86.
[49] Landweber, L. (1951) “An iterative formula for Fredholm integral equations of
the first kind.”Amer. J. of Math. 73, pp. 615–624.
[50] Lange, K. and Carson, R. (1984) “EM reconstruction algorithms for emission
and transmission tomography.”Journal of Computer Assisted Tomography 8,
pp. 306–316.
21
[51] Lange, K., Bahn, M. and Little, R. (1987) “A theoretical study of some maximum
likelihood algorithms for emission and transmission tomography.”IEEE Trans.
Med. Imag. MI-6(2), pp. 106–114.
[52] Leahy, R. and Byrne, C. (2000) “Guest editorial: Recent development in iterative
image reconstruction for PET and SPECT.”IEEE Trans. Med. Imag. 19, pp.
257–260.
[53] Leahy, R., Hebert, T., and Lee, R. (1989) “Applications of Markov random field
models in medical imaging.”in Proceedings of the Conference on Information
Processing in Medical Imaging Lawrence-Berkeley Laboratory, Berkeley, CA.
[54] Levitan, E. and Herman, G. (1987) “A maximum a posteriori probability ex-
pectation maximization algorithm for image reconstruction in emission tomog-
raphy.”IEEE Transactions on Medical Imaging 6, pp. 185–192.
[55] McLachlan, G.J. and Krishnan, T. (1997) The EM Algorithm and Extensions.
New York: John Wiley and Sons, Inc.
[56] Meidunas, E. (2001) Re-scaled Block Iterative Expectation Maximization Max-
imum Likelihood (RBI-EMML) Abundance Estimation and Sub-pixel Material
Identification in Hyperspectral Imagery, MS thesis, Department of Electrical
Engineering, University of Massachusetts Lowell.
[57] Narayanan, M., Byrne, C. and King, M. (2001) “An interior point iterative
maximum-likelihood reconstruction algorithm incorporating upper and lower
bounds with application to SPECT transmission imaging.”IEEE Transactions
on Medical Imaging TMI-20 (4), pp. 342–353.
[58] Natterer, F. (1986) Mathematics of Computed Tomography. New York: John
Wiley and Sons, Inc.
[59] Natterer, F., and W¨ubbeling, F. (2001) Mathematical Methods in Image Recon-
struction. Philadelphia, PA: SIAM Publ.
[60] Peressini, A., Sullivan, F., and Uhl, J. (1988) The Mathematics of Nonlinear
Programming. Berlin: Springer-Verlag.
[61] Pretorius, P., King, M., Pan, T-S, deVries, D., Glick, S., and Byrne, C. (1998) Re-
ducing the influence of the partial volume effect on SPECT activity quantitation
22
with 3D modelling of spatial resolution in iterative reconstruction, Phys.Med.
Biol. 43, pp. 407–420.
[62] Rockmore, A., and Macovski, A. (1976) A maximum likelihood approach to
emission image reconstruction from projections, IEEE Transactions on Nuclear
Science,NS-23, pp. 1428–1432.
[63] Schmidlin, P. (1972) “Iterative separation of sections in tomographic scinti-
grams.”Nucl. Med. 15(1).
[64] Shepp, L., and Vardi, Y. (1982) Maximum likelihood reconstruction for emission
tomography, IEEE Transactions on Medical Imaging,MI-1, pp. 113–122.
[65] Soares, E., Byrne, C., Glick, S., Appledorn, R., and King, M. (1993) Imple-
mentation and evaluation of an analytic solution to the photon attenuation and
nonstationary resolution reconstruction problem in SPECT, IEEE Transactions
on Nuclear Science,40 (4), pp. 1231–1237.
[66] Tanabe, K. (1971) “Projection method for solving a singular system of linear
equations and its applications.”Numer. Math. 17, pp. 203–214.
[67] Vardi, Y., Shepp, L.A. and Kaufman, L. (1985) “A statistical model for positron
emission tomography.”Journal of the American Statistical Association 80, pp.
8–20.
[68] Wernick, M. and Aarsvold, J., editors (2004) Emission Tomography: The Fun-
damentals of PET and SPECT. San Diego: Elsevier Academic Press.
23