Page 1
Annu. Rev. Astron. Astrophys. 2005. 43:139–94
doi: 10.1146/annurev.astro.43.112904.104850
Copyright c ? 2005 by Annual Reviews. All rights reserved
First published online as a Review in Advance on June 16, 2005
DIGITAL IMAGE RECONSTRUCTION:
Deblurring and Denoising
R.C. Puetter,1,4T.R. Gosnell,2,4and Amos Yahil3,4
1Center for Astrophysics and Space Sciences, University of California, San Diego,
La Jolla, CA 92093
2Los Alamos National Laboratory, Los Alamos, NM 87545
3Department of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11794
4Pixon LLC, Stony Brook, NY 11790; email: Rick.Puetter@pixon.com,
Tim.Gosnell@pixon.com, Amos.Yahil@pixon.com
Key Words
regularization, wavelets
image processing, image restoration, maximum entropy, Pixon,
■ Abstract
lying images hidden in blurry and noisy data can be revealed. The main challenge is
sensitivitytomeasurementnoiseintheinputdata,whichcanbemagnifiedstrongly,re-
sultinginlargeartifactsinthereconstructedimage.Thecureistorestrictthepermitted
images.Thisreviewsummarizesimagereconstructionmethodsincurrentuse.Progres-
sivelymoresophisticatedimagerestrictionshavebeendeveloped,including(a)filtering
the input data, (b) regularization by global penalty functions, and (c) spatially adap-
tive methods that impose a variable degree of restriction across the image. The most
reliable reconstruction is the most conservative one, which seeks the simplest underly-
ing image consistent with the input data. Simplicity is context-dependent, but for most
imagingapplications,thesimplestreconstructedimageisthesmoothestone.Imposing
the maximum, spatially adaptive smoothing permitted by the data results in the best
image reconstruction.
Digital image reconstruction is a robust means by which the under-
1. INTRODUCTION
Digital image processing of the type discussed in this review has been developed
extensively and now routinely provides high-quality, robust reconstructions of
blurryandnoisydatacollectedbyawidevarietyofsensors.Thefieldexistsbecause
itisimpossibletobuildimaginginstrumentsthatproducearbitrarilysharppictures
uncorrupted by measurement noise. It is, however, possible mathematically to
reconstruct the underlying image from the nonideal data obtained from real-world
instruments,sothatinformationpresentbuthiddeninthedataisrevealedwithless
blur and noise. The improvement from raw input data to reconstructed image can
be quite dramatic.
0066-4146/05/0922-0139$20.00
139
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 2
140
PUETTER ? GOSNELL ? YAHIL
Our choice of nomenclature is deliberate. Throughout this review, “data” refers
to any measured quantity, from which an unknown “image” is estimated through
the process of image reconstruction.1The term image denotes either the estimated
solution or the true underlying image that gives rise to the observed data. The
discussion usually makes clear which context applies; in cases of possible ambi-
guity we use “image model” to denote the estimated solution. Note that the data
and the image need not be similar and may even have different dimensionality,
e.g., tomographic reconstructions seek to determine a 3D image from projected
2D data.
Image reconstruction is difficult because substantial fluctuations in the image
may be strongly blurred, yielding only minor variations in the measured data.
This causes two major, related problems for image reconstruction. First, noise
fluctuations may be mistaken for real signal. Overinterpretation of data is always
problematic, but image reconstruction magnifies the effect to yield large image
artifacts. The high wave numbers (spatial frequencies) of the image model are par-
ticularly susceptible to these artifacts, because they are suppressed more strongly
by the blur and are therefore less noticeable in the data. In addition, it may be
impossible to discriminate between competing image models if the differences in
the data models obtained from them by blurring are well within the measurement
noise. For example, two closely spaced point sources might be statistically indis-
tinguishable from a single, unresolved point source. A definitive resolution of the
image ambiguity can then only come with additional input data.
Image reconstruction tackles both these difficulties by making additional as-
sumptions about the image. These assumptions may appeal to other knowledge
about the imaged object, or there may be accepted procedures to favor a “reason-
able” or a “conservative” image over a “less reasonable” or an “implausible” one.
The key to stable image reconstruction is to restrict the permissible image mod-
els, either by disallowing unwanted solutions altogether, or by making it much
less likely that they are selected by the reconstruction. Almost all modern im-
age reconstructions restrict image models in one way or another. They differ
only in what they restrict and how they enforce the restriction. The trick is not
to throw out the baby with the bath water. The more restrictive the image re-
construction, the greater its stability, but also the more likely it is to eliminate
correct solutions. The goal is therefore to describe the allowed solutions in a
sufficiently general way that accounts for all possible images that may be encoun-
tered, and at the same time to be as strict as possible in selecting the preferred
images.
1Historically, the problem of deblurring and denoising of imaging data was termed image
restoration, a subtopic within a larger computational problem known as image reconstruc-
tion. Most contemporary workers now use the latter, more general term, and we adopt this
terminology in this review. Also, some authors use “image” for what we call “data” and
“object” for what we call “ image.” Readers familiar with that terminology need to make a
mental translation when reading this review.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 3
DIGITAL IMAGE RECONSTRUCTION
141
There are great arguments on how image restriction should be accomplished.
After decades of development, the literature on the subject is still unusually edito-
rialandevencontentiousintone.Attimesitsoundsasthoughimagereconstruction
isanart,amatteroftasteandsubjectivepreference,insteadofanobjectivescience.
We take a different view. For us, the goal of image reconstruction is to come as
close as possible to the true underlying image, pure and simple. And there are
objective criteria by which the success of image reconstruction can be measured.
First, it must be internally self-consistent. An image model predicts a data model,
and the residuals—the differences between the data and the data model—should
be statistically consistent with our understanding of the measurement noise. If we
see structure in the residuals, or if their statistical distribution is inconsistent with
noisestatistics,thereissomethingwrongwiththeimagemodel.Thisleavesimage
ambiguity that cannot be statistically resolved by the available data. The image
reconstruction can then only be validated externally by additional measurements,
preferably by independent investigators. Simulations are also useful, because the
true image used to create the simulated data is known and can be compared with
the reconstructed image.
Thereareseveralexcellentreviewsofimagereconstructionandnumericalmeth-
ods by other authors. These include: Calvetti, Reichel & Zhang (1999) on itera-
tive methods; Hansen (1994) on regularization methods; Molina et al. (2001) and
Starck, Pantin & Murtagh (2002) on image reconstruction in astronomy; Narayan
& Nityananda (1986) on the maximum-entropy method; O’Sullivan, Blahut &
Snyder (1998) on an information-theoretic view; Press et al. (2002) on the inverse
problem and statistical and numerical methods in general; and van Kempen et al.
(1997) on confocal microscopy. There are also a number of important regular con-
ferences on image processing, notably those sponsored by the International Soci-
ety for Optical Engineering (http://www.spie.org), the Optical Society of America
(http://www.osa.org), and the Computer Society of the Institute of Electrical and
Electronics Engineers (http://www.computer.org).
Our review begins with a discussion of the mathematical preliminaries in
Section 2. Our account of image reconstruction methods then proceeds from the
simple to the more elaborate. This path roughly follows the historical develop-
ment, because the simple methods were used first, and more sophisticated ones
were developed only when the simpler methods proved inadequate.
Simplest are the noniterative methods discussed in Section 3. They provide
explicit, closed-form inverse operations by which data are converted to image
models in one step. These methods include Fourier and small-kernel deconvolu-
tions, possibly coupled with Wiener filtering, wavelet denoising, or quick Pixon
smoothing. We show in two examples that they suffer from noise amplifica-
tion to one degree or another, although good filtering can greatly reduce its
severity.
The limitations of noniterative methods motivated the development of iterative
methods that fit an image model to the data by using statistical tests to determine
how well the model fits the data. Section 4 launches the statistical discussion by
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 4
142
PUETTER ? GOSNELL ? YAHIL
introducing the concepts of merit function, maximum likelihood, goodness of fit,
and error estimates.
Fitting methods fall into two broad categories, parametric and nonparametric.
Section 5 is devoted to parametric methods, which are suitable for problems in
which the image can be modeled by explicit, known source functions with a few
adjustableparameters.“Clean”isanexampleofaparametricmethodusedinradio
astronomy. We also include a brief discussion of parametric error estimates.
Section 6 introduces simple nonparametric iterative schemes including the van
Cittert,Landweber,Richardson-Lucy,andconjugate-gradientmethods.Thesenon-
parametric methods replace the small number of source functions with a large
number of unknown image values defined on a grid, thereby allowing a much
larger pool of image models. But image freedom also results in image instability,
which requires the introduction of image restriction. The two simplest forms of
imagerestrictiondiscussedinSection6aretheearlyterminationofthemaximum-
likelihoodfit,beforeitreachesconvergence,andtheimpositionoftherequirement
that the image be nonnegative. The combination of the two restrictions is surpris-
ingly powerful in attenuating noise amplification and even increasing resolution,
but some reconstruction artifacts remain.
To go beyond the general methods of Section 6 requires additional restrictions
whose task is to smooth the image model and suppress the artifacts. Section 7 dis-
cusses the methods of linear (Tikhonov) regularization, total variation, and max-
imum entropy, which impose global image preference functions. These methods
were originally motivated by two differing philosophies but ended up equivalent
to each other. Both state image preference using a global function of the image
and then optimize the preference function subject to data constraints.
Global image restriction can significantly improve image reconstruction, but,
because the preference function is global, the result is often to underfit the data
in some parts of the image and to overfit in other parts. Section 8 presents spa-
tially adaptive methods of image restriction, including spatially variable entropy,
wavelets,MarkovrandomfieldsandGibbspriors,atomicpriorsandmassiveinfer-
ence,andthefullPixonmethod.Section8endswithseveralexamplesofbothsim-
ulatedandrealdata,whichservetoillustratethetheoreticalpointsmadethroughout
the review.
We end with a summary in Section 9. Let us state our conclusion outright. The
future, in our view, lies with the added flexibility enabled by spatially adaptive
image restriction coupled with strong rules on how the image restriction is to be
applied. On the one hand, the larger pool of permitted image models prevents
underfitting the data. On the other hand, the stricter image selection avoids over-
fitting the data. Done correctly, we see the spatially adaptive methods providing
the ultimate image reconstructions.
The topics presented in this review are by no means exhaustive of the field,
but limited space prevents us from discussing several other interesting areas of
image reconstruction. A major omission is superresolution, a term used both for
subdiffraction resolution (Hunt 1995, Bertero & Boccacci 2003) and subpixel
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 5
DIGITAL IMAGE RECONSTRUCTION
143
resolution. Astronomers might be familiar with the “drizzle” technique developed
attheSpaceTelescopeScienceInstitute(Fruchter&Hook2002)toobtainsubpixel
resolution using multiple, dithered data frames. There are also other approaches
(Borman & Stevenson 1998; Elad & Feuer 1999; Park, Park & Kang 2003).
Other areas of image reconstruction left out include: (a) tomography (Natterer
1999); (b) vector quantization, a technique widely used in image compression and
classification (Gersho & Gray 1992, Cosman et al. 1993, Hunt 1995, Sheppard
et al. 2000); (c) the method of projection onto convex sets (Biemond, Lagendijk
& Mersereau 1990); (d) the related methods of singular-value decomposition,
principal components analysis, and independent components analysis (Raykov
& Marcoulides 2000; Hyv¨ arinen, Karhunen & Oja 2001; Press et al. 2002); and
(e) artificial neural networks, used mainly in image classification but also useful
in image reconstruction (D´ avila & Hunt 2000; Egmont-Petersen, de Ridder &
Handels 2002).
Wealsoconfineourselvestophysicalblur—instrumentaland/oratmospheric—
thatspreadstheimageovermorethanonepixel.Lossofresolutionduetothefinite
pixelsizecaninprinciplebeovercomebybetteroptics(strongermagnification)ora
finerfocalplanearray.Inpractice,theuserisusuallylimitedbytheopticsandfocal
plane array at hand. The main recourse then is to take multiple, dithered frames
and use some of the superresolution techniques referenced above. Ironically, if the
physicalblurextendsovermorethanonepixel,onecanalsorecoversomesubpixel
resolution by requiring that the image be everywhere nonnegative (Section 2.4).
This technique does not work if the physical blur is less than one pixel wide.
Another area that we omit is the initial data reduction that needs to be per-
formed before image reconstruction can begin. This includes nonuniformity cor-
rections,eliminationofbadpixels,backgroundsubtractionwhereappropriate,and
the determination of the type and level of statistical noise in the data. Major astro-
nomical observatories often have online manuals providing instructions for these
operations, e.g., the direct imaging manual of Kitt Peak National Observatory
(http://www.noao.edu/kpno/manuals/dim).
2. MATHEMATICAL PRELIMINARIES
2.1. Blur
The image formed in the focal plane is blurred by the imaging instrument and
the atmosphere. It can be expressed as an integral over the true image, denoted
symbolically by ⊗:
M(x) = P ⊗ I =
For a 2D image, the integration is over the 2D y-space upon which the image
is defined. In general, the imaging problem may be defined in a space with an
?
P(x,y)I(y)dy.
(1)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 6
144
PUETTER ? GOSNELL ? YAHIL
arbitrary number of dimensions, and the dimensionalities of x and y need not even
bethesame(e.g.,intomography).Wavelengthand/ortimemightprovideadditional
dimensions. There may also be multiple focal planes, or multiple exposures in the
same focal plane, perhaps with some dithering.
The kernel of the integral, P(x, y), is called the point-spread function. It is the
probability that a photon originating at position y in the image plane ends up at
position x in the focal plane. Another way of looking at the point-spread function
is as the image formed in the focal plane by a point source of unit flux at position
y, hence, the name point-spread function.
2.2. Convolution
In general, the point-spread function varies independently with respect to both x
and y, because the blur of a point source may depend on its location in the image.
Opticalsystemsthatsufferstronggeometricaberrationsbehaveinthisway.Often,
however, the point-spread function can be accurately written as a function only of
the displacement x − y, independent of location within the field of view. In that
case, Equation 1 becomes a convolution integral:
?
When necessary, we use the symbol*to specify a convolution operation to make
clearthedistinctionwiththemoregeneralintegraloperation⊗.Mostoftheimage
reconstruction methods described in this review, however, are general and are not
restricted to convolving point-spread functions.
Convolutions have the added benefit that they translate into simple algebraic
products in the Fourier space of wave vectors k (e.g., Press et al. 2002):
M(x) = P ∗ I =
P(x − y)I(y)dy.
(2)
˜M(k) =˜P(k)˜I(k).
(3)
Foraconvolvingpoint-spreadfunctionthereisthusadirectk-by-kcorrespondence
betweenthetrueimageandtheblurredimage.Thereasonforthissimplicityisthat
the Fourier spectral functions exp(ιk · x) are eigenfunctions of the convolution
operator, i.e., convolving them with the point-spread function returns the input
function multiplied by the eigenvalue˜P(k).
The separation of the blur into decoupled operations on each of the Fourier
components of the image points to a simple method of image reconstruction.
Equation 3 may be solved for the image in Fourier space:
˜I(k) =˜M(k)/˜P(k),
(4)
and the image is the inverse Fourier transform of˜I(k). In practice, this method is
limited by noise (Section 3.1). We bring it up here as a way to think about image
reconstructionand,inparticular,aboutsamplingandbiasing(Sections2.3and2.4).
Eveninthemoregeneralcaseinwhichimageblurisnotapreciseconvolution,itis
still useful to conceptualize image reconstruction in terms of Fourier components,
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 7
DIGITAL IMAGE RECONSTRUCTION
145
because the coupling between different Fourier components is often limited to a
small range of k.
2.3. Data Sampling
Imagingdetectorsdonotactuallymeasurethecontinuousblurredimage.Inmodern
digital detectors, the photons are collected and counted in a finite number of pixels
with nonzero widths placed at discrete positions in the focal plane. They typically
form an array of adjacent pixels. The finite pixel width results in further blur,
turning the point-spread function into a point-response function. Assuming that
all the pixels have the same response, the point-response function is a convolution
of the pixel-responsivity function S and the point-spread function:
H(x,y) = S ∗ P.
(5)
The point-response function is actually only evaluated at the discrete positions of
the pixels, xi, typically taken to be at the centers of the pixels. The data expected
to be collected in pixel i—in the absence of noise (Section 2.7)—is then
?
WealsorefertoMiasthedatamodelwhentheimageunderdiscussionisanimage
model.
Image reconstruction actually only requires knowledge of the point-response
function, which relates the image to the expected data. There is never any need to
determine the point-spread function, because the continuous blurred image is not
measureddirectly.Butitisnecessarytodeterminethepoint-responsefunctionwith
sufficient accuracy (Section 2.9). Approximating it by the point-spread function is
often inadequate.
Mi=
H(xi,y)I(y)dy = (H ⊗ I)i.
(6)
2.4. How to Overcome Aliasing Due to Discrete Sampling
Another benefit of the Fourier representation is its characterization of sampling
andbiasing.Thesamplingtheoremtellsuspreciselywhatcanandcannotbedeter-
mined from a discretely sampled function (e.g., Press et al. 2002). Specifically, the
sampled discrete values completely determine the continuous function, provided
that the function is bandwidth-limited within the Nyquist cutoffs:
−
1
2∆= −kc≤ k ≤ kc=
1
2?.
(7)
(The Nyquist cutoffs are expressed in vector form, because the grid spacing ∆
need not be the same in different directions.) By the same token, if the continuous
function is not bandwidth-limited and has significant Fourier components beyond
the Nyquist cutoffs, then these components are aliased within the Nyquist cutoffs
and cannot be distinguished from the Fourier components whose wave vectors
really lie within the Nyquist cutoffs. This leaves an inherent ambiguity regarding
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 8
146
PUETTER ? GOSNELL ? YAHIL
the nature of any continuous function that is sampled discretely, thereby limiting
resolution.
The point-spread function is bandwidth-limited and therefore so is the blurred
image. The bandwidth limit may be a strict cutoff, as in the case of diffraction, or
a gradual one, as for atmospheric seeing (in which case the effective bandwidth
limit depends on the signal-to-noise ratio). In any event, the blurred image is as
bandwidth-limitedasthepoint-spreadfunction.Itisthereforecompletelyspecified
by any discrete sampling whose Nyquist cutoffs encompass the bandwidth limit
of the point-spread function. The true image, however, is what it is and need
not be bandwidth-limited. It may well contain components of interest beyond the
bandwidth limit of the point-spread function.
Thepreviousdiscussionaboutconvolution(Section2.2)suggeststhattherecon-
structed image would be as bandwidth-limited as the data and little could be done
to recover the high-k components of the image. This rash conclusion is incorrect.
We usually have additional information about the image, which we can utilize. Al-
most all images must be nonnegative. (There are some important exceptions, e.g.,
complex images, for which positivity has no meaning.) Sometimes we also know
somethingabouttheshapesofthesources,e.g.,theymayallbestellarpointsources.
Taking advantage of the additional information, we can determine the high-k im-
age structure beyond the bandwidth limit of the data (Biraud 1969). For example,
wecantellthatanonnegativeimageisconcentratedtowardonecornerofthepixel
because the point-spread function preferentially spreads flux to pixels around that
corner. (Autoguiders take advantage of this feature to prevent image drift.)
We can see how this works for a nonnegative image by considering the Fourier
reconstruction of an image blurred by a convolving point-spread function (Section
2.2). The reconstructed Fourier image˜I(k) is determined by Equation 4 within
the bandwidth limit of the data. But the image obtained from it by the inverse
Fourier transformation may be partly negative. To remove the negative image
values, we must extrapolate˜I(k) beyond the bandwidth limit. We are free to do
so, because the data do not constrain the high-k image components. Of course, we
cannot extrapolate too far, or else aliasing will again cause ambiguity. But we can
establish some k limit on the image by assuming that the image has no Fourier
components beyond that limit. The point is that the image bandwidth limit needs
to be higher than the data bandwidth limit, and the reconstructed image therefore
has higher resolution than the data. The increased resolution is a direct result of
therequirementofnonnegativity,withoutwhichwecannotextrapolatebeyondthe
bandwidth limit of the data. In the above example of subpixel structure, we can
deduce the concentration of the image toward the corner of the pixel because the
image is restricted to be nonnegative. If it could also be negative, the same data
couldresultfromanynumberofimages,asthesamplingtheoremtellsus.Subpixel
resolution is enabled by nonnegativity. On the other hand, if we also know that the
image consists of point sources, we can constrain the high-k components of the
image even further and obtain yet higher resolution.
Note that the pixel-responsivity function is not bandwidth-limited in and of
itself because it has sharp edges. If the point-spread function is narrower than a
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 9
DIGITAL IMAGE RECONSTRUCTION
147
pixel, the data are not Nyquist sampled. The reconstructed image is then subject to
additional aliasing, and it may not be possible to increase resolution beyond that
setbythepixelationofthedata.Ifthepoint-spreadfunctionismuchnarrowerthan
the pixel, a point source could be anywhere inside the pixel. We cannot determine
its position with greater accuracy, because the blur does not spill enough of its flux
into neighboring pixels.
2.5. Background Subtraction
Theuseofnonnegativityasamajorconstraintinimagereconstructionpointstothe
importance of background subtraction. Requiring a background-subtracted image
to be nonnegative is much more restrictive, because when an image sits on top of a
significant background, negative fluctuations in the image can be absorbed by the
background. The user is therefore well advised to subtract the background before
commencingimagereconstruction,ifatallpossible.Astronomicalimagesusually
lend themselves to background subtraction because a significant area of the image
is filled exclusively by the background sky. It may be more difficult to subtract the
background in a terrestrial image.
The best way to subtract background is by chopping and nodding, alternating
betweenthetargetandanearbyblankfieldandrecordingonlydifferencemeasure-
ments.Buttheimaginginstrumentmustbedesignedtodoso.(Seethedescriptions
of such devices at major astronomical observatories, e.g., on the thermal-region
cameraspectrographoftheGeminiSouthtelescope,http://www.gemini.edu/sciops
/instruments/miri/T-ReCSChopNod.html.)Absentsuchcapability,thebackground
can only be subtracted after the data are taken. Several methods have been pro-
posed to subtract background (Bijaoui 1980; Infante 1987; Beard, McGillivray &
Thanisch1990;Almoznino,Loinger&Brosch1993;Bertin&Arnouts1996).The
background may be a constant or a slowly varying function of position. It should
inanyeventnotvarysignificantlyonscalesoverwhichtheimageisknowntovary,
or else the background subtraction may modify image structures. There are also
several ways to clip sources when estimating the background (see the discussion
by Bertin & Arnouts 1996).
2.6. Image Discretization
In general, an image can either be specified parametrically by known source func-
tions (Section 5) or it can be represented nonparametrically on a discrete grid
(Section6),inwhichcasetheintegralinEquation1isconvertedtoasum,yielding
a set of linear equations:
?
in matrix notation M=HI. Here M is a vector containing n expected data values, I
is a vector of m image values representing a discrete version of the image, and H is
an n × m matrix representation of the point-response function. Note that each of
what we here term vectors is in reality often a multidimensional array. In a typical
Mi=
j
HijIj,
(8)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 10
148
PUETTER ? GOSNELL ? YAHIL
2D case, n = nx × ny, m = mx × my, and the point-response function is an
(nx × ny) × (mx × my) array.
Note also for future reference that one often needs the transpose of the point-
response function HT, which is the m × n matrix obtained from H by transposing
its rows and columns. HTis the point-response function for an optical system
in which the roles of the image plane and the focal plane are reversed (known
in tomography as back projection). It is not to be confused with the inverse of
the point-response function H−1, an operator that exists only for square matrices,
n = m. When applied to the expected data, H−1provides the image that gave rise
to the expected data through the original optical system at hand.
Thediscussionofsamplingandaliasing(Section2.4)showsthattheimagemust,
under some circumstances, be determined with better resolution than the data to
ensure that it is nonnegative. This requires the image grid to be finer than the data
grid,sotheimageisNyquistsampledwithintherequisiteimagebandwidth.Inthat
case, the number of image values is not equal to the number of data points, and the
point-response function is not a square matrix. Conversely, if the image is known
to be more bandwidth-limited than the data, or if the signal-to-noise ratio is low,
so the high-k components are uncertain, we may choose a coarser discretization
of the image than the data.
In the case of a convolution, the discretization is greatly simplified by using
Equation 2 to give:
?
Note that Equation 9 assumes that the data and image grids have the same spacing,
and the point-response function takes a different form than in Equation 8. Like the
expected data and the image, H is also an n-point vector, not an n × n matrix.
(Recall that n refers to the total number of grid points, e.g., for a 2D array n =
nx× ny.) Discrete convolutions of the form of Equation 9 have a discrete Fourier
analog of Equation 3 and can be computed efficiently by fast Fourier transform
techniques (e.g., Press et al. 2002). When the image grid needs to be more finely
spaced than the data, the convolution is performed on the image grid and then
sampled at the positions of the pixels on the coarser grid.
Mi=
j
Hi−jIj.
(9)
2.7. Noise
A major additional factor limiting image reconstruction is noise due to measure-
ment errors. The measured data actually consist of the expected data Miplus
measurement errors:
Di= Mi+ Ni= (H ⊗ I)i+ Ni=
?
H(xi,y)I(y)dy + Ni.
(10)
The discrete form of Equation 10 is obtained in analogy with Equation 8.
Measurement errors fall into two categories. Systematic errors are recurring
errors caused by erroneous measurement processes or failure to take into account
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 11
DIGITAL IMAGE RECONSTRUCTION
149
physical effects that modify the measurements. In addition, there are random,
irreproducible errors that vary from one measurement to the next. Because we do
notknowandcannotpredictwhatarandomerrorwillbeinanygivenmeasurement,
we can at best deal with random errors statistically, assuming that they are random
realizationsofsomeparentstatisticaldistribution.Inimaging,themostcommonly
encounteredparentstatisticaldistributionsaretheGaussian,ornormal,distribution
and the Poisson distribution (Section 4.2).
To be explicit, consider a trial solution to Equation 10,ˆI(y), and compute the
residuals
Ri= Di− Mi= Di−
?
H(xi,y)ˆI(y)dy.
(11)
Theimagemodelisanacceptablesolutionoftheinverseproblemiftheresidualsare
consistentwiththeparentstatisticaldistributionofthenoise.Thedatamodelisthen
our estimate of the reproducible signal in the measurements, and the residuals are
our estimate of the irreproducible noise. There is something wrong with the image
model if the residuals show systematic structure or if their statistical distribution
differs significantly from the parent statistical distribution. Examples would be if
its mean is not zero, or if the distribution is skewed, too broad, or too narrow. After
the fit is completed, it is therefore imperative to apply diagnostic tests to rule out
problems with the fit. Some of the most useful diagnostic tools are goodness of fit,
analysis of the statistical distribution of the residuals and their spatial correlations,
and parameter error estimation (Sections 4 and 5).
2.8. Instability of Image Reconstruction
Image reconstruction is unfortunately an ill-posed problem. Mathematicians con-
sider a problem to be well posed if its solution (a) exists, (b) is unique, and (c)
is continuous under infinitesimal changes of the input. The problem is ill posed
if it violates any of the three conditions. The concept goes back to Hadamard
(1902, 1923). Scientists and engineers are usually less concerned with existence
and uniqueness and worry more about the stability of the solution.
In image reconstruction, the main challenge is to prevent measurement errors
in the input data from being amplified to unacceptable artifacts in the recon-
structed image. Stated as a discrete set of linear equations, the ill-posed nature
of image reconstruction can be quantified by the condition number of the point-
response-function matrix. The condition number of a square matrix is defined as
the ratio between its largest and smallest (in magnitude) eigenvalues (e.g., Press
et al. 2002).2A singular matrix has an infinite condition number and no unique
solution. An ill-posed problem has a large condition number, and the solution is
sensitive to small changes in the input data.
2If the number of data and image points is not equal, then the point-response function is not
square, and its condition number is not strictly defined. We can then use the square root of
the condition number of HTH, where HTis the transpose of H.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 12
150
PUETTER ? GOSNELL ? YAHIL
Howlargeistheconditionnumber?Arealisticpoint-responsefunctioncanblur
the image over a few pixels. In this case, the highest k components of the image
are strongly suppressed by the point-response function. In other words, the high-
k components correspond to eigenfunctions of the point-response function with
very small eigenvalues. Hence, there is no escape from a large condition number.
Equations 8 or 9 can therefore not be solved in their present, unrestricted forms.
Either the equations must be modified, or the solutions must be projected away
fromthesubspacespannedbytheeigenfunctionswithsmalleigenvalues.Thebulk
of this review is devoted to methods of image restriction (Sections 3–8).
2.9. Accuracy of the Point-Response Function
Image reconstruction is further compromised if the point-response function is not
determined accurately enough. The signal-to-noise ratio determines how accu-
rately it needs to be determined. The goal is that the wings of bright sources will
notbeconfusedwithweaksourcesnearby.Thehigherthesignal-to-noiseratio,the
greater the care needed in determining the point-response function. The residual
errors caused by the imprecision of the point-response function should be well
below the noise.
Thefirstrequirementisthattheprofileofthepoint-responsefunctioncorrespond
to the real physical blur. If the point-response function assumed in the reconstruc-
tion is narrower than the true point-response function, the reconstruction cannot
remove all the blur. The image model then has less than optimal resolution, but
artifactsshouldnotbegenerated.Ontheotherhand,iftheassumedpoint-response
function is broader than the true one, then the image model looks sharper than it
really is. In fact, if the scene has narrow sources or sharp edges, it may not be pos-
sible to reconstruct the image correctly. Artifacts in the form of “ringing” around
sharp objects are then seen in the image model.
Second, the point-response function must be defined on the image grid, which
mayneedtobefinerthanthedatagridtoensureanonnegativeimage(Section2.4).
A point-response function measured from a single exposure of one point source is
then inadequate because it is only appropriate for sources that are similarly placed
within their pixels as the source used to measure the point-response function.
Either multiple frames need to be taken displaced by noninteger pixel widths,
or the point-response function has to be determined from multiple point sources
spanning different intrapixel positions. In either case, the point-response function
is determined on an image grid that is finer than the data grid.
2.10. Numerical Considerations
Weconcludethemathematicalpreliminariesbynotingthatthefullmatrixcontain-
ing the point-response function is usually prohibitively large. A modern 1024 ×
1024detectorarrayyieldsadatasetof106elements,andHcontains1012elements.
Clearly, one must avoid schemes that require the use of the entire point-response
functionmatrix.Fortunately,theydonotallneedtobestoredincomputermemory,
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 13
DIGITAL IMAGE RECONSTRUCTION
151
nor do all need to be used in the matrix multiplication of Equation 8. The number
of nonnegligible elements is often a small fraction of the total, and sparse matrix
storage can be used (e.g., Press et al. 2002). The point-response function may
also exhibit symmetries, such as in the case of convolution (Equation 9), which
enables more efficient storage and computation. Alternatively, because H always
appears as a matrix multiplication operator, one can write functions that compute
themultiplicationontheflywithouteverstoringthematrixvaluesinmemory.Such
computations can take advantage of specialized techniques, such as fast Fourier
transforms (Section 3.1) or small-kernel deconvolutions (Section 3.2).
3. NONITERATIVE IMAGE RECONSTRUCTION
Anoniterativemethodforsolvingtheinverseproblemisonethatderivesasolution
through an explicit numerical manipulation applied directly to the measured data
in one step. The advantages of the noniterative methods are primarily ease of
implementation and fast computation. Unfortunately, noise amplification is hard
to control.
3.1. Fourier Deconvolution
Fourierdeconvolutionisoneoftheoldestandnumericallyfastestmethodsofimage
deconvolution. If the noise can be neglected, then the image can be determined
using a discrete variant of the Fourier deconvolution (Equation 4), which can be
computed efficiently using fast Fourier transforms (e.g., Press et al. 2002). The
technique is used in speckle image reconstruction (Jones & Wykes 1989; Ghez,
Neugebauer & Matthews 1993), Fourier-transform spectroscopy (Abrams et al.
1994, Prasad & Bernath 1994, Serabyn & Weisstein 1995), and the determination
of galaxy redshifts, velocity dispersions, and line profiles (Simkin 1974, Sargent
et al. 1977, Bender 1990).
Unfortunately,theFourierdeconvolutiontechniquebreaksdownwhenthenoise
may not be neglected. Noise often has significant contribution from high k, e.g.,
white noise has equal contributions from all k. But ˜ H(k), which appears in the
denominatorofEquation4,fallsoffrapidlywithk.Theresultisthathigh-knoisein
thedataissignificantlyamplifiedbythedeconvolutionandcreatesimageartifacts.
The wider the point-response function, the faster˜H(k) falls off at high k and the
greater the noise amplification. Even for a point-response function extending over
only a few pixels, the artifacts can be so severe that the image is completely lost
in them.
3.2. Small-Kernel Deconvolution
Fast Fourier transforms perform convolutions very efficiently when used on stan-
dard desktop computers but they require the full data frame to be collected before
the computation can begin. This is a great disadvantage when processing raster
video in pipeline fashion as it comes in, because the time to collect an entire
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 14
152
PUETTER ? GOSNELL ? YAHIL
data frame often exceeds the computation time. Pipeline convolution of raster data
streamsismoreefficientlyperformedbymassivelyparallelsummationtechniques,
even when the kernel covers as much as a few percent of the area of the frame.
In hardware terms, a field-programmable gate array (FPGA) or an application-
specific integrated circuit (ASIC) can be much more efficient than a digital signal
processor(DSP)oramicroprocessorunit(MPU).FPGAsorASICsavailablecom-
mercially can be built to perform small-kernel convolutions faster than the rate at
which raster video can straightforwardly feed them, which is currently up to ∼150
megapixels per second.
Pipeline techniques can be used in image reconstruction by writing deconvolu-
tions as convolutions by the inverse H−1of the point-response function
I = H−1∗ D,
(12)
which is equivalent to the Fourier deconvolution (Equation 4). But H−1extends
over the entire array, even if H is a small kernel (spans only a few pixels). Not to
be thwarted, one then seeks an approximate inverse kernel G ≈ H−1, which can
be designed to span only ∼3 full widths at half maximum of H.
Moreover, G can also be designed to suppress the high-k components of H−1
and to limit ringing caused by sharp discontinuities in the data, thereby reducing
the artifacts created by straight Fourier methods.
3.3. Wiener Filter
InSection3.1,wesawthatdeconvolutionofthedataresultsinstrongamplification
of high-k noise. The problem is that the signal decreases rapidly at high k, while
the noise is usually flat (white) and does not decay with k. In other words, the
high-k components of the data have poor signal-to-noise ratios.
The standard way to improve k-dependent signal-to-noise ratio is linear filter-
ing, which has a long history in the field of signal processing and has been applied
in many areas of science and engineering. The Fourier transform of the data˜D(k)
is multiplied by a k-dependent filter ? (k), and the product is transformed back to
provide filtered data. Linear filtering is a particularly useful tool in deconvolution,
because the filtering can be combined with the Fourier deconvolution (Equation
4) to yield the filtered deconvolution,
˜I(k) = ?(k)
˜D(k)
˜H(k).
(13)
It can be shown (e.g., Press et al. 2002) that the optimal filter, which minimizes
the difference (in the least squares sense) between the filtered noisy data and the
true signal, is the Wiener filter, expressed in Fourier space as
?(k) =
?|˜D0(k)|2?
?|˜D0(k)|2? + ?|˜N(k)|2?=?|˜D0(k)|2?
?|˜D(k)|2?.
(14)
Here ?|˜N(k)|2? and ?|˜D0(k)|2? are the expected power spectra (also known as
spectral densities) of the noise and the true signal, respectively. Their sum, which
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 15
DIGITAL IMAGE RECONSTRUCTION
153
appears in the denominator of Equation 14, is the power spectrum of the noisy
data ?|˜D(k)|2?, because the signal and the noise are—by definition—statistically
independent, so their power spectra add up in quadrature.
The greatest difficulty in determining ?(k) (Equation 14) comes in estimating
?|˜D0(k)|2?. In many cases, however, the noise is white and easily estimated at high
k, where the signal is negligible. In practice, it is necessary to average over many
k values, because the statistical fluctuation of any individual Fourier component
is large. The power spectrum of the signal is then determined from the difference
?|˜D0(k)|2? = ?|˜D(k)|2? − ?|˜N(k)|2?. Again, averaging is needed to reduce the
statistical fluctuations.
A disadvantage of the Wiener filter is that it is completely deterministic and
doesnotleavetheuserwithatuningparameter.Itisthereforeusefultointroducean
ad hoc parameter β into Equation 14 to allow the user to adjust the aggressiveness
of the filter.
?|˜D0(k)|2?
?|˜D0(k)|2? + β?|˜N(k)|2?.
Standard Wiener filtering is obtained with β = 1. Higher values result in more
aggressive filtering, whereas lower values yield a smaller degree of filtering.
?(k) =
(15)
3.4. Wavelets
WesawinSection3.1and3.3thattheFouriertransformisaveryconvenientwayto
performdeconvolution,becauseconvolutionsaresimpleproductsinFourierspace.
ThedisadvantageoftheFourierspectralfunctionsisthattheyspanthewholeimage
and cannot be localized. One might wish to suppress high k more in one part of
the image than in another, but that is not possible in the Fourier representation.
The alternative is to use other, more localized spectral functions. These functions
arenolongereigenfunctionsofaconvolvingpoint-responsefunction,sotheimage
reconstruction is not as simple as in the Fourier case, but they might still retain
the Fourier characteristics, at least approximately. How are we to choose those
functions?Ontheonehandwewishmorelocalization.Ontheotherhand,wewant
to characterize the spectral functions by the rate of spatial oscillation, because we
know that we need to suppress the high-k components. Of course, the nature of
the Fourier transform is such that there are no functions that are perfectly narrow
in both image space and Fourier space (the uncertainty principle). The goal is to
find a useful compromise.
The functions to emerge from the quest for oscillatory spectral functions with
local support have been wavelets. The most frequently used wavelets are those
belonging to a class discovered by Daubechies (1988). In addition to striking
a balance between x and k support, they satisfy the following three conditions:
(a) they form an orthonormal set, which allows easy transformation between the
spatial and spectral domains, (b) they are translation invariant, i.e., the same func-
tioncanbeusedindifferentpartsoftheimage,and(c)theyarescaleinvariant,i.e.,
they form a hierarchy in which functions with larger wavelengths are scaled-up
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 16
154
PUETTER ? GOSNELL ? YAHIL
versions of functions with shorter wavelengths. These three requirements have the
important practical consequence that the wavelet transform of n data points and its
inverse can each be computed hierarchically in O (n log2n) operations, just like
the fast Fourier transform (e.g., Press et al. 2002).
A more recent development has been a shift to nonorthogonal wavelets known
as ` a trous (with holes) wavelets (Holschneider et al. 1989; Shensa 1992; Bijaoui,
Starck & Murtagh 1994; Starck, Murtagh & Bijaoui 1998). The wavelet basis
functions are redundant, with more wavelets than there are data points. Their ad-
vantage is that the wavelet transform consists of a series of convolutions, so each
wavelet coefficient is a Fourier filter. Of course, each ` a trous wavelet, being local-
ized in space, corresponds to a range of k, so the wavelets are not eigenfunctions
of the point-response function. Also the ` a trous wavelet noise spectrum needs to
be computed carefully, because the ` a trous wavelets are redundant and nonorthog-
onal. Spatially uncorrelated noise, which is white in Fourier space, is not white in
` a trous wavelet space. (It is white for orthonormal wavelets, such as Daubechies
wavelets.)
Wavelet filtering is similar to Fourier filtering and involves the following:
Wavelet-transform the data to the spectral domain, attenuate or truncate wavelet
coefficients, and transform back to data space. The wavelet filtering can be as
simple as truncating all coefficients smaller than mσ, where σ is the standard de-
viation of the noise. Alternatively, soft thresholding reduces the absolute values
of the wavelet coefficients (Donoho & Johnstone 1994, Donoho 1995a). A further
refinement is to threshold high k more strongly (Donoho 1995b) or to modify the
high-kwaveletstoreducetheirnoiseamplification(Kalifa,Mallat&Rouge2003).
Yet another possibility is to apply a wavelet filter analogous to the Wiener filter
(Equation 14).
Once the data have been filtered, deconvolution can proceed by the Fourier
method or by small-kernel deconvolution. Of course, the deconvolution cannot be
performed in wavelet space, because the wavelets, including the ` a trous wavelets,
are not eigenfunctions of the point-response function. Wavelet filtering can also
be combined with iterative image reconstruction (Section 8.2).
3.5. Quick Pixon
The Pixon method is another way to obtain spatially adaptive noise suppression.
We defer the comprehensive discussion of the Pixon method and its motivation
to Sections 8.5 and 8.6. Briefly, it is an iterative image restriction technique that
smoothes the image model in a spatially adaptive way. A faster variant is the
quick Pixon method, which applies the same adaptive Pixon smoothing to the data
instead of to image models. This smoothing can be performed once on the input
data, following which the data can be deconvolved using the Fourier method or
small-kernel deconvolution.
ThequickPixonmethod,thoughnotquiteaspowerfulasthefullPixonmethod,
nevertheless often results in reconstructed images that are nearly as good as those
of the full Pixon method. The advantage of the quick Pixon method is its speed.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 17
DIGITAL IMAGE RECONSTRUCTION
155
Because the method is noniterative and consists primarily of convolutions and
deconvolutions,thecomputationcanbeperformedinpipelinefashionusingsmall-
kernel convolutions. This allows one to build special-purpose hardware to process
raster video in real time at the maximum available video rates.
3.6. Discussion
The performance of Wiener deconvolution can be assessed from the reconstructed
images shown in Figure 1. For this example a 128 × 128 synthetic truth image,
shown in panel (b) of the figure, is blurred by a Gaussian point-response function
with a full width at half maximum of 4 pixels. Constant Gaussian noise is added to
this blurred image, so that the brightest pixels of all of the synthetic sources yield
a peak signal-to-noise ratio per pixel of 50. The resulting input data are shown in
panel (a) of the figure.
Next,thecentralcolumnofpanelsshowsaWienerreconstructionandassociated
residualswhenlessaggressivefilteringischosenbysettingβ = 0.1(Equation15).
This yields greater recovered resolution and good, spectrally white residuals but at
the expense of large noise-related artifacts that appear in the reconstructed image.
In fact, noise amplification makes these artifacts so large as to risk confusion with
real sources in the image. This illustrates the major difficulty of image ambiguity,
which we emphasized from the start. The reconstructed image in panel (c) results
in reasonable residuals, similar to those that would be obtained from the truth
image in panel (b), because blurring by the point-response function suppresses
the differences between the two images, and differences in the data model fall
below the measurement noise. We reject panel (c) compared with panel (b) not
because it fits the data less well, but because we know on the basis of other
knowledge (experience) that it is a less plausible image. In an effort to improve
the reconstruction, we might choose more aggressive filtering with β = 10, as in
the Wiener reconstruction that appears in the right-hand column. Here the image
artifacts are less troublesome but the resolution is poorer and the residuals now
show significant correlation with the signal.
The improvement in resolution brought about by a selection of reconstructions
isshowninTable1,whichliststhefullwidthsathalfmaximumoftwosourcesfrom
Figure 1 with good signal-to-noise ratios. Shown are the widths of the sources in
the truth image, the data, and the reconstructions. The Wiener reconstructions
improve resolution by ∼1 pixel. This may be compared with other reconstructions
not shown in Figure 1. The quick Pixon method (Section 3.5) and the nonnegative
least-squares fit (Section 6.2) reduce the width by ∼1.5 pixels, whereas the full
Pixon method (Section 8.6) reduces the true widths by ∼2.5 pixels, restoring the
truewidthsofthesources.Bearinmindalsothatwidthsaddinquadrature.Amore
appropriate assessment of the resolution boost is therefore made by considering
the reduction in the squares of the widths in Table 1.
Figure 2 shows Wiener, wavelet, and quick Pixon reconstructions of simulated
data obtained from a real image of New York City by blurring it using a Gaussian
point-responsefunctionwithfullwidthathalfmaximumoffourpixelsandadding
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 18
156
PUETTER ? GOSNELL ? YAHIL
Figure 1
Wiener reconstructions of a synthetic image: (a) data, (b) truth image, and (c) weak filtering (β =
0.1), overfitting the data, with (d) residuals (χ2/n = 0.89), and (e) strong filtering (β = 10), underfitting the data,
with (f) residuals (χ2/n = 1.15).
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 19
DIGITAL IMAGE RECONSTRUCTION
157
TABLE 1
the full widths at half maximum (in pixels) of two bright sources in Figure 1
Resolution improvement of image reconstruction techniques measured by
Source TruthData
Wiener
β = 0.1
Wiener
β = 10
Quick
Pixon
Nonnegative
least-squares
Full
Pixon
1
2
1.83
1.81
4.40
4.39
2.91
3.17
3.50
3.85
2.67
3.11
2.70
2.77
1.90
1.78
Gaussian noise so the peak signal-to-noise ratio per pixel is 50. The top panels
show, from left to right, a standard Wiener deconvolution with β = 1, a wavelet
reconstruction with Wiener-like filtering with β = 2, and a quick Pixon recon-
struction. The wavelet and quick Pixon deconvolutions are performed by a small
kernelof15 × 15pixels(Section3.2).Thetruthimageandthedataarenotshown
here for lack of space, but are shown in Figure 3.
The Wiener reconstruction shows excellent residuals but the worst image ar-
tifacts. The wavelet reconstruction shows weaker artifacts, but the residuals are
poor, particularly at sharp edges. One can try to change β, but this only makes
matters worse. The choice of β = 2 is our best compromise between more arti-
facts at lower threshold and poorer residuals at higher threshold. The quick Pixon
reconstruction fares best. The residuals are tolerable, although somewhat worse
than those of the wavelet reconstruction. The main advantage of the quick Pixon
reconstructionisthelowartifactlevel.Forthatreason,thatimagepresentsthebest
overall visual acuity.
The need to find a good tradeoff between resolution and artifacts is univer-
sal for noniterative image reconstructions and invites the question whether better
techniques are available that simultaneously yield high resolution, minimal image
artifacts, and residuals consistent with random noise. The search for such tech-
niques has led to the development of iterative methods of solution as discussed in
the next several sections.
4. ITERATIVE IMAGE RECONSTRUCTION
4.1. Statistics in Image Reconstruction
WesawinSection3thateventhoughthenoniterativemethodstakeintoaccountthe
statistical properties of the noise (with the exception of direct Fourier deconvolu-
tion), the requirement that image reconstruction be completed in one step prevents
full use of the statistical information. Iterative methods are more flexible and can
go a step further, allowing us to fit image models to the data. They thus infer an
explanation of the data based upon the relative merits of possible solutions. More
precisely, we consider a defined set of potential models of the image. Then, with
the help of statistical information, we choose amongst these models the one that
is the most statistically consistent with the data.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 20
158
PUETTER ? GOSNELL ? YAHIL
Figure 2
Variety of noniterative image reconstructions: (a) Wiener (β = 1) with (b) residuals, (c) wavelet
(β = 2) with (d) residuals, and (e) quick Pixon with (f) residuals. The data and the truth image are shown in
Figure 3.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 21
DIGITAL IMAGE RECONSTRUCTION
159
Figure3
Convergednonnegativeleast-squaresfitcomparedwithweakWienerfiltering:(a)data,(b)truthimage,
(c) Wiener filter (β = 0.1) with (d) residuals (χ2/n = 0.88), and (e) converged nonnegative least-squares fit with
(f) residuals (χ2/n = 0.76).
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 22
160
PUETTER ? GOSNELL ? YAHIL
Consistency is obtained by finding the image model for which the residuals
form a statistically acceptable random sample of the parent statistical distribution
of the noise. The data model is then our estimate of the reproducible signal in the
measurements, and the residuals are our estimate of the irreproducible statistical
noise. Note that the residuals need not all have an identical parent statistical dis-
tribution, e.g., the standard deviation of the residuals may vary from one pixel to
the next. But there must be a well-defined statistical law that governs all of them,
and we have to know it, at least approximately, in order to fit the data.
There are three components of data fitting (e.g., Press et al. 2002). First, there
must be a fitting procedure to find the image model. This is done by minimizing
a merit function, often subject to additional constraints. Second, there must be
tests of goodness of fit—preferably multiple tests—that determine whether the
residuals obtained are consistent with the parent statistical distribution. Third, one
would like to estimate the remaining errors in the image model.
Toclarifywhateachofthosecomponentsofdatafittingis,considerthefamiliar
example of a linear regression. We might determine the regression coefficients by
finding the values that minimize a merit function consisting of the sum of the
squares of the residuals. Then we check for goodness of fit in a variety of ways.
One method is to consider the minimum value of the same sum of squares of
the residuals that we used for our merit function. But this time we ask a different
question,notwhatvaluesofthecoefficientsminimizeit,butwhethertheminimum
sum of squares found is consistent with our estimate of the noise level. We also
want to insure that the residuals are randomly distributed. Nonrandom features
might indicate that the linear fit is insufficient and that we should add parabolic
or other high-order terms to our fitting functions. In addition, we want to check
that the distribution (histogram) of residual values follows the parent statistical
distribution of the noise within expected statistical fluctuations. We suspect the fit
ifthemeanoftheresidualsissignificantlynonzero,oriftheirdistributionisskewed
orhasunexpectedlystrongorweaktails.Finally,oncewefindasatisfactoryfit,we
wish to know the uncertainty in the derived parameters, i.e., the scatter of values
that we would find by performing linear regressions of multiple, independent data
sets.
The same procedures are used in image reconstruction and are geared to the
parent statistical distribution of the noise, because the goal is to produce residuals
that are statistically consistent with that distribution. The merit function is usually
the log-likelihood function described in Section 4.2, to which are added a host of
image restrictions (Sections 6–8). Goodness of fit is diagnosed by the χ2statis-
tic and by considering the statistical distribution and spatial correlations of the
residuals. We check for spatially uncorrelated residuals with zero mean, standard
deviationcorrespondingtothenoiselevelofthedata,andnounexpectedskewness
or tail distributions.
The precision of the image and error estimates are much harder to obtain. Vi-
sual inspection can be deceiving. In a good reconstruction, the image shows fewer
fluctuationsthanthedata.Conversely,apoorreconstructionmaycreatesignificant
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 23
DIGITAL IMAGE RECONSTRUCTION
161
artifacts, whose amplitudes may exceed the noise level of the data. In neither case
does the noise magically change. What is happening is that the image intensities
arestronglycorrelated.Thedifferencesbetweentheimageintensitiesinneighbor-
ing pixels may be smaller or bigger than in the data, but that is because they have
correlated errors. To compute error propagation in a reconstructed image analyti-
callyisnexttoimpossible,soMonteCarlosimulationsaretheonlyrealisticwayto
assesstheerrorsofmeasurementsmadeonthereconstructedimage.Analternative
is to fit the desired image features parametrically, because parametric fits have a
built-in mechanism for error estimates, even in the presence of a nonparametric
background (Section 5.2).
4.2. Maximum Likelihood
A given image model I results in a data model M (Equations 8 or 9). The parent
statistical distribution of the noise in turn determines the probability of the data
given the data model p(D|M). This is then the conditional probability of the data
given the image p(D|I). The most common parent statistical distributions are the
Gaussian (or normal) distribution and the Poisson distribution. The noise in differ-
ent pixels is statistically independent, and the joint probability of all the pixels is
the product of the probabilities of the individual pixels. The Gaussian probability
is
?
and the discrete Poisson distribution is
?
If there are correlations between pixels, p(D|I) is a more complicated function.
In practice, it is more convenient to work with the log-likelihood function, a
logarithmic quantity derived from the likelihood function:
p(D|I) =
i
?2πσ2
i
?−1/2e−(Di−Mi)2/2σ2
i,
(16)
p(D|I) =
i
e−MiMDi
i/Di!.
(17)
? = −2ln[p(D|I)] = −2
?
i
ln[p(Di|I)],
(18)
where the second equality in Equation 18 applies to statistically independent
data. The factor of two is added for convenience, to equate the log-likelihood
function with χ2for Gaussian noise and to facilitate parametric error estimation
(Section 5).
The goal of data fitting is to find the best estimateˆI of I such that p(D|ˆI) is
consistentwiththeparentstatisticaldistribution.Themaximum-likelihoodmethod
selects the image model by maximizing the likelihood function or, equivalently,
minimizing the log-likelihood function (Equation 18). This method is known in
statistics to provide the best estimates for a broad range of parametric fits in the
limitinwhichthenumberofestimatedparametersismuchsmallerthanthenumber
of data points (e.g., Stuart, Ord & Arnold 1998). We consider such parametric fits
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 24
162
PUETTER ? GOSNELL ? YAHIL
first in Section 5. Most image reconstructions, however, are nonparametric, i.e.,
the “parameters” are image values on a grid, and their number is comparable
to the number of data points. For these methods, maximum likelihood is not a
good way to estimate the image and can lead to significant artifacts and biases.
Nevertheless, it continues to be used in image reconstruction, but with additional
image restrictions designed to prevent the artifacts. A major part of this review is
devoted to nonparametric methods (Sections 6–8).
5. PARAMETRIC IMAGE RECONSTRUCTION
5.1. Simple Parametric Modeling
Parametric fits are always superior to other methods, provided that the image can
be correctly modeled with known functions that depend upon a few adjustable pa-
rameters. One of the simplest parametric methods is a least-squares fit minimizing
χ2, the sum of the residuals weighted by their inverse variances:
χ2=
?
i
R2
σ2
i
i
=
?
i
(Di− Mi)2
σ2
i
.
(19)
For a Gaussian parent statistical distribution (Equation 16), the log-likelihood
function, after dropping constants, is actually χ2, so the χ2fit is also a maximum-
likelihood solution.
ForaPoissondistribution,thelog-likelihoodfunction,afterdroppingconstants,
is
?
a logarithmic function, whose minimization is a nonlinear process. The log-
likelihood function also cannot be used for goodness-of-fit tests. One can write
χ2-like merit functions, but parameter estimation based on these statistics is usu-
ally biased by about a count per pixel, which can be a significant fraction of the
flux at low counts. This bias is removed by Mighell (1999), who adds correction
terms to both the numerator and denominator,
? = 2
i
(Mi+ Diln Mi),
(20)
χ2
γ=
?
i
[Di+ min(Di,1) − Mi]2
Di+ 1
,
(21)
and shows that parameter estimation using this statistic is indeed unbiased.
5.2. Error Estimation
Fitting χ2has two additional advantages: The minimum χ2is a measure of good-
ness of fit, and the variation of the χ2around its minimum value can be used to
estimate the errors of the parameters (e.g., Press et al. 2002). Here we wish to
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 25
DIGITAL IMAGE RECONSTRUCTION
163
emphasize the distinction between “interesting” and “uninteresting” parameters,
and the role they play in image error estimation.
A convenient way to estimate the errors of a fit with p parameters is to draw a
confidencelimitinthep-dimensionalparameterspace,ahypersurfacesurrounding
the fitted values on which there is a constant value of χ2. If ?χ2= χ2− χ2
is the difference between the value of χ2on the hypersurface and the minimum
value found by fitting the data, then the tail probability α that the parameters
would be found outside this hypersurface by chance is approximately given by a
χ2distribution with p degrees of freedom (Press et al. 2002).
min
α ≈ P(?χ2, p).
(22)
Equation 22 is approximate because, strictly speaking, it applies only to a linear fit
withGaussiannoise,forwhichχ2isaquadraticfunctionoftheparametersandthe
hypersurface is an ellipsoid. It is common practice, however, to adopt Equation 22
astheconfidencelimitevenwhentheerrorsarenotGaussian,orthefitisnonlinear,
and the hypersurface deviates from ellipsoidal shape.
Parametric fits often contain a combination of q “interesting” parameters and
r = p − q “uninteresting” (sometimes called “nuisance”) parameters. To ob-
tain a confidence limit for only the interesting parameters, without any limits on
the uninteresting parameters, one determines the q-dimensional hypersurface for
which
α ≈ P(?χ2,q).
(23)
The only proviso is that in computing ?χ2for any set of interesting parameters
q, χ2is optimized with respect to all the uninteresting parameters (Avni 1976,
Press et al. 2002). A special case is that of a single interesting parameter (q = 1).
The points at which ?χ2= m2are then the mσ error limits of the parameter. In
particular, the 1σ limit is found where ?χ2= 1.
Unfortunately, the errors of the nonparametric fits (Sections 6–8) cannot be
estimated in this way, because of the difficulties in assigning a meaning to χ2in
nonparametric fits (Section 6.1). This leaves Monte Carlo simulations as the only
way to assess errors in the general nonparametric case. There is, however, the
hybrid case of a combined parametric and nonparametric fit, in which the errors
of the nonparametric part of the fit are of no interest. For example, we might
wish to measure the positions and fluxes of stars in the presence of a background
nebula, which is not interesting in its own right but affects the accuracy of our
astrometry and photometry. In that case, we can perform a parametric fit of the
interesting astrometric and photometric parameters, optimizing the background
nonparametrically as “uninteresting” parameters.
5.3. Clean
Parameter errors are also important in models in which the total number of param-
eters is not fixed. The issue here is to determine when the fit is good enough and
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 26
164
PUETTER ? GOSNELL ? YAHIL
additional parameters do not significantly improve it. The implicit assumption is
that a potentially large list of parameters is ordered by importance according to
some criterion, and the fit should not only determine the values of the important
parameters but also decide the cutoff point beyond which the remaining, less im-
portant parameters may be discarded. For example, we may wish to fit the data
to a series of point sources, starting with the brightest and continuing with pro-
gressively weaker sources, until the addition of yet another source is no longer
statistically significant.
Clean, an iterative method that was originally developed for radio-synthesis
imaging (H¨ ogbom 1974), is an example of parametric image reconstruction with
a built-in cutoff mechanism. Multiple point sources are fitted to the data one at
a time, starting with the brightest sources and progressing to weaker sources, a
processdescribedas“cleaningtheimage.”Initssimplestform,theCleanalgorithm
consists of four steps. Start with a zero image. Second, add to your image a new
pointsourceatthelocationofthelargestresidual.Third,fitthedataforthepositions
and fluxes of all the point sources introduced into your image so far. (The image
consists of a bunch of point sources. The data model is a sum of point-spread
functions, each scaled by the flux of the point source at its center.) Fourth, return
tothesecondstepiftheresidualsarenotstatisticallyconsistentwithrandomnoise.
Clean has enabled synthesis imaging of complicated fields of radio sources
even with limited coverage of the Fourier (u,v) plane (see Thompson, Moran &
Swenson 2001). There are many variants of the method. Clark (1980) selects
many components and subtracts them in a single operation, rather than separately,
thereby substantially reducing the computational effort by limiting the number of
convolutionsbetweenimagespaceandthe(u,v)plane.Cornwell(1983)introduces
aregularizationterm(Section7.1)toreduceartifactsinreconstructionsofextended
sources due to the incomplete coverage of the (u,v) plane. Steer, Dewdney, & Ito
(1984)proposeanumberofadditionalchangestopreventtheformationofstripesor
corrugationsduringtheprocessingofimageswithextendedfeaturesandarithmetic
roundingproblems.Phaseclosure(Pearson&Readhead1984)providesadditional
constraints. But the problem remains that Clean fits extended objects a pixel at a
time. This has led to methods that use multiple scales (Wakker & Schwarz 1988,
Bhatnagar & Cornwell 2004).
6. NONPARAMETRIC IMAGE RECONSTRUCTION
Despite their great power in performing high-quality and robust image reconstruc-
tions, the use of parametric methods is severely restricted by the requirement that
explicitfunctionsbeidentifiedwithwhichtomodeltheimage.Inshort,significant
prior knowledge of image features is required. In this section we relax this restric-
tion and introduce nonparametric methods for the estimation of image models. A
generalfeatureofsuchmodelsisthatthenumberofmodelvaluestobedetermined
can be comparable to or exceed the number of data points. In the simplest case,
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 27
DIGITAL IMAGE RECONSTRUCTION
165
a nonparametric method accomplishes this by defining an image model on a grid
of pixels equal in size to that of the data. The method must then by some means
determine image values for all pixels in the image grid. In the worst case, each
image value may be individually and independently adjustable.
Clearly,thestepfromparametrictononparametricmodelingisadrasticonethat
yields a combinatorial explosion of possible image models. In fact, nonparametric
methods draw from a pool of possible image models that is much too general.
RecallingfromSection2.8ourassertionthattheinverseproblemisillconditioned,
suchgeneralityprovesespeciallychallengingwhenthesignal-to-noiseratioislow.
Oneexpectsthatbecausethespaceofpotentialsolutionsissolarge,reconstruction
artifacts will abound.
Theresultisthatiterativenonparametricmethodsthatenforcenorestrictionson
imagemodelsareoftennobetteratcontrollingnoisethanthenoniterativemethods
presented in Section 3. Obtaining both image generality and good noise control
thus requires inclusion of additional constraints for limiting the permissible image
models. In this and subsequent sections, we present a series of nonparametric
methods that differ only in the means by which these constraints are designed and
enforced and how the solution is found.
Theiterativemethodsusuallyusethelog-likelihoodfunctionastheirmeritfunc-
tion,despiteitsinadequacyfornonparametricfits,buttheyrestrictitsminimization
indifferentways.Somestopthefittingprocedurebeforethemeritfunctionisfully
minimized, some impose restrictions on permitted image values, some create a
new merit function by adding to the log-likelihood function an explicit penalty
function, which steers the solution away from unwanted image models, and some
do more than one of these things. In this section we first present two constraint
methods,earlyterminationofthefitandenforcementofnonnegativeimagevalues,
andthendiscussafewiterativeschemestofitthelog-likelihoodfunction,knownin
the statistics literature as expectation-maximization methods (Dempster, Laird &
Rubin1977).InSection7wediscussglobalimagerestrictionbymeansofaglobal
penaltyfunction,whichservestoregularizethesolution,allowingittoconvergeto
a reasonable solution. Finally, Section 8 is devoted to spatially adaptive methods
to restrict the image.
6.1. Early Termination of the Fit
Carried to completion, a nonparametric maximum-likelihood fit can result in zero
residuals. For example, if the image and the data are defined on the same grid,
then a nonnegative point-response function is a nonsingular, square matrix, which
has an inverse. The maximum-likelihood solution is therefore the one for which
the residuals are identically zero, as in Fourier deconvolution (Section 3.1). This
solution, however, is far from optimal if the noise is expected to have a finite
standarddeviation.Asetofzeroresidualsishardlyastatisticallyacceptablesample
of the parent statistical distribution. The problem is that the maximum-likelihood
method was designed for problems in which the number of unknown parameters
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 28
166
PUETTER ? GOSNELL ? YAHIL
is much smaller than the number of data points, and we use it to solve a problem
in which they are comparable or even equal.
One way to avoid letting the residuals become too small in an iterative fit is
to terminate the fit before this happens. A fit might be stopped when a goodness-
of-fit measure, such as the χ2, falls below a designated value. But what is that
value for a χ2statistic? The expectation value of χ2is the number of degrees of
freedom,equaltothedifferencebetweenthenumberofdatapointsandthenumber
of parameters, but what is the number of parameters? In the above example, the
number of data points is equal to the number of image points, so the number of
degrees of freedom is technically zero, and we should let the fit run to completion.
But that is not what we would like to do.
Wetaketheoppositepointofview,placingahigherpremiumonavoidingnoise
amplification and spurious artifacts than on seeking a “perfect” fit that interprets
statistical noise as real signal. If the image model were the true image the correct
stopping point would be when χ2equals the number of data points n. We prefer to
go even further and conservatively stop the fit earlier, when χ2reaches n +√2n,
a point higher by one standard deviation of the χ2for n degrees of freedom.
There might be some concern about an iterative method that is not carried
out to convergence. First, the result may depend on the initial image. In practice,
this is rarely a problem. We normally set the initial image to be identically zero
and find adequate fits. Second, the stopping criterion is a global condition. The
solution might, in fact, overfit the data in some areas and underfit in other areas.
This does happen and is one of the main reasons for adopting spatially adaptive
reconstruction methods that limit the image locally and not globally (Section 8).
Fortunately, it is easy to identify uneven fits by displaying the residuals.
6.2. Nonnegative Least-Squares
A simple constraint that greatly increases the performance of a maximum-likeli-
hoodmethodistodisallownegativeimagevalues.Whenappliedtoaleast-squares
fit, this procedure is known as a nonnegative least-squares fit. Nonnegativity is
certainly a necessary restriction for almost all images. (There are exceptions, e.g.,
image reconstruction in the complex Fourier space.) But forcing the image to be
nonnegativealsostronglysuppressesartifacts.Aqualitativeargumentthatsupports
this idea is that if the image contains both large positive and large negative fluctu-
ations on length scales smaller than the width of the point-response function, then
these fluctuations mutually cancel upon convolution with the point-response func-
tion. Restricting the image to nonnegative values thus also reduces the magnitude
of the positive fluctuations. As a result, artifacts are significantly reduced.
The requirement that the image be nonnegative also increases resolution
(Section 2.4). The degree of possible subpixel resolution depends on the signal-
to-noise ratio and the width of the point-response function. Half-pixel resolution,
and even quarter-pixel resolution, can often be obtained. When the structure of the
source is known, e.g., a star is known to be a point source, it is possible to pinpoint
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 29
DIGITAL IMAGE RECONSTRUCTION
167
its position even better, often to a tenth of a pixel. Indeed, it may not be possible to
find a good image reconstruction with an image model defined on the same grid as
the data. It is not only feasible to extract subpixel information, it may be necessary
to do so.
Procedures that impose nonnegativity include changes of variable and simply
setting negative values to zero after each iteration. In our own work with iterative
schemesthatminimizethelog-likelihoodfunctionwehavefoundthat(a)achange
ofvariablecancauseveryslowconvergenceofimagevaluesresidingnearzeroand
(b) setting negative values to zero after each iteration does not hurt convergence
and may actually speed it up (Section 6.6).
6.3. van Cittert
Having considered a couple of ways to restrict images, we next turn to iterative
computational methods to find the image models. Instead of solving for all the
unknown variables at once, one uses the approximate solution found in a previous
iteration in order to compute the next iteration. In the statistics literature, such
an iterative method is called an expectation-maximization method, because it al-
ternates between substituting expectation values (the previous solution) for some
of the variables in the likelihood function and maximizing the likelihood func-
tion with respect to the remaining unknowns. Performed correctly, expectation-
maximization methods are guaranteed to converge (Dempster, Laird & Rubin
1977), but convergence can be slow.
The van Cittert (1931) method is one of the earliest and simplest iterative
methodsforimagereconstructionproblemsinwhichthedataandimagearedefined
on the same grid. The iteration begins with the zeroth-order image I(0)≡ 0 at all
grid points and iterates from there according to
I(k+1)= I(k)+ α(D − H ⊗ I(k)) = αD + Q ⊗ I(k),
where Q = 1 − αH, and 1 is the identity kernel. The iterations are designed
to converge to the deconvolved image. Successive substitutions into Equation 24
yield
(24)
I(k)= α
k−1
?
j=0
Qj⊗ D = H−1⊗ (1 − Qk) ⊗ D −→
k→∞H−1⊗ D,
(25)
where Qjdenotes a j-fold convolution of the function Q with itself, the second
equality represents the sum of the geometric series, and the limit k → ∞ applies
as long as Qk⊗ D → 0 in that limit.
The limiting solution has zero residuals, just as in the case of the Fourier
deconvolutiondiscussedinSection3.1.(ButnotethatthevanCittertmethodisnot
limited to deconvolutions.) If carried far enough, the van Cittert method therefore
exhibits noise amplification just as do the Fourier-based methods, and the iteration
mustbeterminatedpriortoconvergence.TheartofapplyingthevanCittertmethod
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 30
168
PUETTER ? GOSNELL ? YAHIL
is in choosing a value of the parameter α and establishing a stopping criterion, so
that the computation time, noise amplification, and degree of recovered resolution
are acceptable. Although the convergence of the van Cittert iterations can be slow,
solutions can be obtained especially quickly when the point-spread function is
centrally peaked and relatively narrow (Lagendijk & Biemond 1991).
Numerous modifications to the technique include disallowing any negative im-
agevalues,settingupperboundstotheimagevalues,andmoresophisticatedmeth-
ods that apply noise filters at select iterations (Agard 1984; Biemond, Lagendijk
& Mersereau 1990; Wallace, Schaefer & Swedlow 2001). Such a modified ver-
sion of the method has been commercially implemented for applications in 3D
deconvolution in light microscopy (Wallace, Schaefer & Swedlow 2001). Other
implementations use wavelet-based filtering at each iteration, removing statisti-
cally insignificant features from the solution (Section 8.2).
6.4. Landweber
Another iterative scheme (Landweber 1951) is:
I(k+1)= I(k)+ αHT⊗
R
σ2,
(26)
where the superscript T denotes the transpose operation, and α is a small positive
parameter. This method is designed to minimize the sum of the squares of the
residuals by insuring that the next change in the image, ?I = I(k+1)− I(k), is in
the direction of the negative of the gradient (negradient) of χ2with respect to I.
The choice of α, however, is arbitrary and depends on the image. If it is too large,
the iteration can overshoot the minimum along the negradient direction and even
result in worse residuals. Indeed, workers using the method have found that it
often initially produces a good solution but thereafter begins to diverge (Bertero
& Boccacci 1998; Calvetti, Reichel & Zhang 1999).
In practice, users of the Landweber method often modify the procedure to
avoidnegativeimagevalues,whichyieldstheprojectiveLandwebermethod(Eicke
1992).Othersimpleconstraintscanbeimposedusingprojectionoperatorsineither
the spatial or spectral domains (Bertero & Boccacci 2000). Another variation is to
modify α during the iteration (Liang & Xu 2003).
6.5. Richardson-Lucy
TheRichardson-Lucymethod(Richardson1972,Lucy1974,Shepp&Vardi1982)
was developed specifically for data comprising discrete, countable events that
follow a Poisson distribution. The nonlinear log-likelihood function (Equation 20)
is minimized iteratively using multiplicative corrections:
I(k+1)=
?
HT⊗
?
D
M(k)
??
I(k).
(27)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 31
DIGITAL IMAGE RECONSTRUCTION
169
The square brackets on the right-hand side of Equation 27 enclose the factor by
whichthepreviousI(k)ismultiplied(notconvolved)togivethenewI(k+1).Itresults
from a back projection operation, in which the ratio between the data, D, and the
data model of the previous iteration, M(k)= H ⊗ I(k), is operated upon by HT, the
transpose (not the inverse) of the point-response function.
Lucy (1974) shows that the algorithm is flux conserving, maintains image non-
negativity,anddecreasesthelog-likelihoodfunctionineachiteration,atleastifone
takes only part of the step indicated by Equation 27. But the method yields noise-
related artifacts when the signal-to-noise ratio is low (van Kempen et al. 1997).
Improvement can be achieved in a number of ways. Snyder & Miller (1991)
first exaggerate deblurring by obtaining the maximum-likelihood solution for a
point-response function that is deliberately broadened by convolving it with an
extra sieve function. The solution, which is too sharp and may contain ringing,
is then broadened by the same sieve function. Another approach is to modify the
log-likelihood function by adding a penalty function along the lines discussed in
Section7(Joshi&Miller1993,Conchello&McNally1996),modifyingEquation
27 according to the general expectation-maximization procedure of Dempster,
Laird & Rubin (1977).
6.6. Conjugate-Gradient
The iterative schemes described in Sections 6.3–6.5 are all designed to converge
to the maximum-likelihood solution (and are stopped early to avoid overfitting the
data), but their convergence is slow. Modern minimization techniques converge
much faster by utilizing the Hessian matrix of second-order partial derivatives of
themeritfunctionwithrespecttothevariables.Unfortunately,theHessianmatrixis
toobigtobecomputedfortypicalimagereconstructionproblems.Oneistherefore
left with minimization schemes that collect and use the information contained in
the Hessian matrix without ever computing the entire matrix.
An excellent example of such a technique is the conjugate-gradient method
(e.g., Press et al. 2002). The method starts from some initial image I(0), where it
computesthenegativegradient(negradient)g(0)ofthelog-likelihoodfunctionwith
respect to the image and sets the initial conjugate-gradient direction h(0)= g(0).
It then constructs a sequence of negradients g(k)and conjugate-gradient directions
h(k)as follows. First, it locates the minimum of the log-likelihood function along
the conjugate-gradient direction h(k). Second, at the position of the minimum it
computes the next negradient g(k+1). Third, it sets the new conjugate-gradient
direction to a linear combination of the old conjugate-gradient direction and the
new negradient
h(k+1)= g(k+1)+ γkh(k).
(28)
The coefficient γkis chosen to optimize convergence. We generally prefer the one
devised by Polak & Ribiere (1969):
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 32
170
PUETTER ? GOSNELL ? YAHIL
γk=
?
j
?g(k+1)
j
− g(k)
?g(k)
j
?g(k+1)
j
?
j
j
?2
,
(29)
where the sums are over all the image points.
The stopping criterion for the conjugate-gradient minimization is similar to
that of the slower methods. There is some evidence that the first iterations of the
conjugate-gradient method introduce mainly low-k components into the solution,
and components with higher k are added mainly in later iterations (Hansen 1994).
Stopping the iterations in time therefore also provides a smoother solution and
helps to reduce high-k noise amplification.
We have found that the most effective way to impose nonnegative solutions
is to modify the conjugate-gradient method as follows. At each iteration of the
conjugate gradient minimization, first compute the negradient. Second, check
thenegradientcomponentsofallthepixelswhoseimagevaluesarezeroandsetthe
negradient components to zero if they are negative, i.e., pointing toward negative
image values. Third, compute the conjugate-gradient direction in the usual way.
Fourth, find the minimum along the conjugate gradient direction without regard
to the sign of the image. Fifth, truncate all negative image values to zero, thereby
jumping to a new solution. Sixth, go back to the first step and continue with the
next conjugate gradient iteration as though no truncation took place.
This procedure may seem ad hoc and liable to disrupt convergence, but the
converse is true. The procedure, in fact, belongs to a class of iterative schemes
called projections onto convex sets, which are guaranteed to converge (Biemond,
Lagendijk&Mersereau1990;Pressetal.2002).Occasionally,thetruncationleads
to an increase instead of a decrease in the value of the merit function. We have
found that this is actually an advantage, because it enables the minimization to
escape from local minima. For many minimizations we reach the stopping point in
about 10 iterations. If the conjugate-gradient algorithm requires more iterations,
it is a good idea to stop the conjugate-gradient iteration every 5–10 iterations and
startitanewatthatpoint,i.e.,tosettheconjugate-gradientdirectioninthedirection
of the negradient h(j+1)= g(j+1).
Finally, we comment that, for a quadratic log-likelihood function, it is possible
to solve for the position of the minimum along the conjugate-gradient direction
analyticallyandproceedthereinonestep(beforetruncationfornegativeimageval-
ues). For nonlinear log-likelihood functions it is necessary to search iteratively for
the minimum, which requires that the log-likelihood function (but not its gradient)
be computed several times along the conjugate-gradient direction. Convergence
may also be accelerated by using preconditioners, replacing the negradients with
vectors that point more closely in the direction of the function minimum. For a
thoroughdiscussionoftheseissuesseePressetal.(2002).(Forthelinearcasethey
actually present the biconjugate-gradient method; the conjugate-gradient method
is a special case, which can be programmed more efficiently.)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 33
DIGITAL IMAGE RECONSTRUCTION
171
7. GLOBAL IMAGE RESTRICTION
In Section 6 we introduced two ways to control noise-related artifacts in image
reconstruction: early termination of iterative fits and enforcement of image non-
negativity. As we show in Section 8.7, even when both are employed, one is still
unablesimultaneouslytosuppresstheartifactsandfitthedatainanadequateman-
ner. In short, the class of allowed solutions defined by nonparametric maximum-
likelihood methods is still too large despite the benefits of these methods of image
restriction. The remainder of this review considers additional constraints that can
and should be brought to bear on image reconstruction. This section considers
global image restrictions. Section 8 is devoted to the more powerful, spatially
adaptive image restrictions.
7.1. Duality Between Regularization and Bayesian Methods
Two main approaches have been developed to impose global constraints on the
solutions of ill-posed problems in general and image reconstruction in particular.
One approach is to steer the solution away from unwanted images by modifying
the merit function, adding a regularization term to the log-likelihood function to
give:
??= ? + λB(I).
(30)
Here B(I) is a penalty function that increases with the degree of undesirability of
the solution, and λ is a penalty normalization parameter that controls the relative
strength of the penalty function with respect to the log-likelihood function. (We
show in Section 7.2 that λ plays the role of a Lagrange multiplier.)
The other approach is to assign to each image model an a priori probability
p(I), also called a prior, and to maximize the product p(D|I)p(I) of the likelihood
function and the prior. This approach is motivated by the desire to maximize the
conditional probability of the image given the data p(I|D), known as the image a
posteriori probability. Bayes’ (1763) theorem is used to relate these quantities:
p(I|D) =p(D|I)p(I)
p(D)
∝ p(D|I)p(I).
(31)
The data are fixed for any image reconstruction, so p(D) is a constant and max-
imizing p(I|D) amounts to maximizing the product p(D|I)p(I). The image recon-
struction is called a Bayesian method, and the solution is called the maximum a
posteriori image. Expressed logarithmically, we obtain an expression similar to
Equation 30:
??= ? − 2ln[p(I)].
(32)
One might imagine that the Bayesian approach is more restrictive, because the
regularizationtermhasanarbitrarypenaltyfunctionwithanadjustablenormaliza-
tion, whereas the prior image probability could have theoretical underpinning and
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 34
172
PUETTER ? GOSNELL ? YAHIL
be completely specified in advance, without adjustable parameters. In reality, the
choice of the prior is just as arbitrary, reflecting the preference of the practitioner
for particular types of images. Moreover, even when the probabilities are set in
some axiomatic way, as in the maximum-entropy method (Section 7.5), an ad-
justable parameter is again introduced, changing Equation 32 to a form equivalent
to Equation 30:
??= ? − λS(I).
(33)
Operationally,therefore,thereisnodifferencebetweenregularizationandBayesian
methods. They both add a term to the log-likelihood function and minimize the
modified merit function. The extra term can be positive and viewed as a penalty
function or negative and viewed as a preference function. It amounts to the same
thing.
Finally,wenoteinpassingthatinsomeoftheBayesianliterature(e.g.,Hoeting
et al. 1999) the authors recommend using the average of the a posteriori image
instead of the maximum:
?p(I|D)IdI
In practice, however, the evaluation of the average image from Equation 34 is
computationally very costly. Furthermore, the effort may not be justified, because
theaposterioriprobabilityissharplypeaked,sothedifferencebetweentheaverage
and the mode is likely to be small.
?I? =
?p(I|D)dI.
(34)
7.2. Penalty Normalization as a Lagrange Multiplier
Anadditionalbenefitofregularizationisthatthefitofaproperlyregularizedprob-
lem can be carried out to convergence. This may suggest that there is no longer
any need for a stopping criterion. This is illusory. Although it is nice to have a
converging fit, it must also produce residuals that are statistically consistent with
the parent statistical distribution. That is, their χ2should be approximately equal
to the number of data points (Morozov 1966). This is achieved by adjusting the
penalty normalization parameter λ in Equation 30 or 33. In fact, one can think
of global image restriction as a formulation of image preference subject to data
constraint. We seek the best image, given our preference function, subject to one
or more constraints imposed by the data. Viewed in this way, λ is a Lagrange
multiplier adjusted to enforce the data constraint. (It makes little difference if
the Lagrange multiplier multiplies the constraint or the preference function.) A
subtle point is whether the data constraint should be a log-likelihood function or
a goodness-of-fit function. The two are identical for Gaussian noise, of course,
but they do differ for other types of noise. A Bayesian purist might opt for a
log-likelihood function, given that the aim is to maximize the a posteriori prob-
ability. But a goodness of fit works just as well, e.g., Equation 21 for Poisson
noise.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 35
DIGITAL IMAGE RECONSTRUCTION
173
The use of χ2as a goodness-of-fit stopping criterion assumes advance knowl-
edge of the standard deviations σ of the noise. When the noise level is not known
in advance, it is possible to estimate it directly from the data. If the data model is
sufficiently smooth, at least in parts of the image, it is possible to estimate σ from
the standard deviation of the data in neighboring pixels. Care must be exercised
to insure that the regions are indeed smooth (or σ will be systematically overesti-
mated)andcompriseenoughpixels(sothatthestatisticalerrorinthedetermination
of σ is manageable).
TwoothermethodshavebeenproposedtosettheLagrangemultiplierλ.Gener-
alized cross validation (Wahba 1977; Golub, Heath & Wahba 1979; Galatsanos &
Katsaggelos 1992; Golub & von Matt 1997) finds λ by bootstrapping, repeatedly
removing random data points from the fit and measuring the effect on the derived
image. The L-curve method (Miller 1970; Lawson & Hanson 1974; Hansen 1992,
1994; Engl & Grever 1994) evaluates the sum of the squares of the residuals as a
function of λ, which gives an L-shaped curve, hence the name of the method. The
preferred value of λ is at the knee of the L, where the curvature is highest.
7.3. Linear (Tikhonov) Regularization
The simplest penalty function is quadratic in the image. The advantage of the
quadratic penalty function is that its gradient with respect to the image is linear,
as is the gradient of the χ2. The optimization of a merit function consisting of
the sum of a χ2and a quadratic penalty function is then a linear problem. The
method is often called Tikhonov (1963) regularization, although it seems to have
been independently suggested by a number of authors (see Press et al. 2002, who
also present a succinct discussion of the method).
The penalty function for linear regularization is the sum of the squares of a
linear mapping of the image:
B(I) =
?
i
??
j
FijIj
?2
,
(35)
which is designed to penalize highly variable images. It is often a finite difference
approximating first-order or second-order derivatives. As with all regularization
methods, the strength of the penalty function is controlled by the Lagrange multi-
plier λ (Equation 30), which is adjusted so that χ2is approximately equal to the
number of data points.
The solution of the linear regularization problem simplifies significantly when
the blur is a convolution and F is also chosen to be a convolution. By analogy with
theFouriermethod(Section2.4),theFouriertransformofthegradientofthemerit
function is a linear equation in˜I(k), whose solution is
˜I(k) =
˜H(k)∗˜D(k)
|˜H(k)|2+ λ|˜F(k)|2.
(36)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 36
174
PUETTER ? GOSNELL ? YAHIL
Note that the complex-conjugate, ˜H(k)∗, which appears in the numerator of
Equation 36, is the Fourier transform of the transpose of the point-response func-
tion HT. In the absence of a regularization term λ = 0, and Equation 36 reduces
to Equation 4, as derived with the Fourier method. The regularization term in the
denominator of Equation 36 serves to suppress the high-k components, as˜F(k) is
designed to peak at high k. One can think of Equation 36 as a generalization of
the Wiener filter (Equation 14) to allow more elaborate filtering.
Image reconstruction using linear regularization has been applied often in the
field of microscopy (e.g., van Kempen et al. 1997). Recent studies have enforced
nonnegativeimageseitherbyachangeofvariables(Carringtonetal.1995;Verveer,
Gemkow & Jovin 1999) or by clipping negative values at each step of a conjugate-
gradient iteration (Lagendijk & Biemond 1991, Vandervoort & Strasters 1995).
7.4. Total Variation
Regularizationschemeswhosepenaltyfunctionsaresmoothfunctionsoftheimage
tend to perform poorly when the underlying truth image contains sharp edges
or steep gradients. A penalty function that overcomes this problem is the total
variation (Rudin, Osher & Fatemi 1992; Vogel & Oman 1998):
?
Equation 37 can be generalized by considering other functions of ∇I. Charbonnier
et al. (1997) discuss the types of functions that would be useful and suggest a few
possibilities.
In the form of Equation 37, the total variation has the property that it applies
the same penalty to a step edge as it does to a smooth transition over the same
range of image amplitudes. The penalty increases only when the image model
develops oscillations. This is a serious limitation, which would cause us to discard
thispenaltyfunctionunlessitisknownaheadoftimethattheimagecontainsmany
sharp edges. If this is not the case, especially if the signal-to-noise ratio is low, the
user risks introducing significant artifacts.
B(I) =
i
|∇I|i.
(37)
7.5. Maximum Entropy
The maximum-entropy method is an attempt to provide an objective image prefer-
ence in analogy with the principles that underlie statistical physics (Jaynes 1957a,
1957b). It assumes that the image is made up of a very large number of quanta,
each with intensity q, and that there is an equal probability that any quantum lands
in any image pixel, as if tossed at random by monkeys (Gull & Daniell 1978). The
probability of obtaining a particular set (n1, n2,..., nL) of pixel occupation num-
bers,withnj= Ij/q,isthenproportionaltothedegeneracyN!/n1!n2!...nL!.Inthe
asymptotic limit of large occupation numbers, the factorials can be approximated
by the Stirling formula (e.g., Press et al. 2002), and the logarithm of the prior
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 37
DIGITAL IMAGE RECONSTRUCTION
175
becomes:
S(I) = ln[p(I)] ≈ −
?
i
niln(ni) = −
?
i
(Ii/q)ln(Ii/q),
(38)
wherewehavedroppedanoverallnormalizationconstantofp(I)(additiveconstant
after taking the logarithm). The preferred image is then the one that maximizes the
entropy.Equation38isanalogoustothespatialdistributionprobabilityofparticles
of an ideal gas, whose logarithm is the Boltzmann entropy. An imaging entropy of
the form Iln(I) was originally proposed by Frieden (1972).
There are three fundamental problems with this approach: (a) there are com-
peting forms of the entropy, even for the same image, (b) the maximum-entropy
image, for any entropy scheme, is not the preferred image, and (c) the entropy de-
pends on the quantum q, which is set arbitrarily and is not related to any physical
quanta making up the image. Because of these problems, the maximum-entropy
method has steered away from its precepts of statistical physics. Let us deal with
these issues one by one.
First, the functional form of the image entropy is not unique. One might view
the electric field in Fourier space, with the spatial image intensity as its power
spectrum, to be the fundamental carrier of image information. In that case, the
entropy is (Ponsonby 1973, Ables 1974, Wernecke & D’Addario 1977):
?
The same expression is obtained from photon Bose-Einstein statistics (Narayan
& Nityananda 1986). The use of entropy of the form ln(I) in imaging actually
predates the use of Equation 38 (Burg 1967).
Second,theimagewiththehighestentropyisaflatimagewithconstantintensity,
whichisnotthepreferredimage.Thepurposeofimagereconstructionistofindthe
true underlying image that has been degraded by blurring and noise, but the target
image is not flat. The flat image also has the unfortunate property of invariance
under random scrambling of the pixels. Surely, any real image would be terribly
degradedundersuchscramblingandthescrambledimageshouldnotbeconsidered
equally preferable to the real image. The spatial distribution of the residuals, once
normalized by the standard deviations of the pixels, should be invariant under
random scrambling of the pixels, but not the image. So, perhaps one should define
the entropy based on the residuals and not on the image. We return to this point in
Section 8.5.
Third, the quanta in the maximum-entropy method cannot be the photons, as
might be supposed based on the analogy with statistical physics. Maximization of
the a posteriori probability entails a balance between the log-likelihood function
andtheentropy,sothedeviationsoftheoptimalsolutionarewithinexpectedstatis-
tical fluctuations from both the maximum-likelihood solution and the maximum-
entropy solution. But this is not possible, because the maximum-entropy solution
is flat, whereas the true image is not, and the difference between the two is highly
S(I) =
i
ln(ni) =
?
i
ln(Ii/q).
(39)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 38
176
PUETTER ? GOSNELL ? YAHIL
significant statistically. This brings us back to the second problem: the underlying
image is not the flat maximum-entropy image.
In practice, users of the maximum-entropy method simply multiply the entropy
by an unknown factor λ, and adjust its value to obtain a reasonably good fit to the
data.Themaximum-entropyminimizationfunctionthustakestheformofEquation
33. For entropy of the form of Equation 38, this corresponds, approximately, to
setting the quantization to a high level, far above that of individual photons. In the
case of entropy of the form of Equation 39, a change of quantization actually does
not help, because it only affects the entropy additively, so the multiplicative factor
istotallyarbitrary.Ineithercase,multiplyingtheentropybyafactorλcorresponds
to raising the prior probability to the power of λ, a procedure that is alien to the
Bayesian approach.
The maximum-entropy method has been applied extensively, particularly in
radio astronomy (see the review by Narayan & Nityananda 1986). But, as they
emphasize,thesuccessofthemaximum-entropymethodisnotduetotheBayesian
preceptsthatledtoit,andfromwhichthemethodhasveeredaway.Itreallyresults
from the characteristics of the entropy function used, particularly infinite slope
at I = 0, which steers the solution away from zero and negative values, and a
negative second derivative, which makes the negative of the entropy (negentropy)
a suitable penalty function. The function√I, with no theoretical basis, would be
just as good and represents an intermediate case between Equations 38 and 39.
The real problem of the maximum entropy method is one that it shares with
all the methods of global image restriction, namely that the same restriction is
applied everywhere in the image. This one-criterion-fits-all approach often leads
to underfits in parts of the image and overfits in other parts. Instead, we should
allow variable image restriction that adapts itself to image conditions. This is the
latest development in image processing, to which we now turn.
8. SPATIALLY ADAPTIVE IMAGE RESTRICTION
Here we explore another class of image restrictions, which applies different image
restrictionsacrosstheimage.Thesetechniquesaremoreflexible,astheycanadapt
themselves to different image conditions, e.g., greater smoothness or variation
in signal-to-noise ratio across the image. But they can also more easily lead to
confusion between signal and noise, thereby resulting in stronger artifacts. The
proof of the pudding is in the images produced. We present a comparison of the
global and local methods in Section 8.7.
8.1. Spatially Variable Maximum Entropy
As we saw, the maximum-entropy functionals in Equations 38 and 39 are par-
ticularly ill-suited to adapt to image content because they are maximized by flat
images. Recognizing this limitation, Skilling (1989) proposes a modified entropy
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 39
DIGITAL IMAGE RECONSTRUCTION
177
functional, whose maximum occurs at a preassigned reference image J. The prob-
abilities of pixel occupancy are no longer equal, so Equation 38 is replaced by the
Poisson log-likelihood function (Equation 20) to give
?
If the reference image is only determined to within an unknown normalization
constant, then the total fluxes of the image and the reference image are set equal
to each other, in which case the first two terms on the right-hand side of Equation
40 cancel, and the equation simplifies to:
?
a form known as the Kullback relative information, Kullback-Leibler divergence,
or cross entropy (Kullback & Leibler 1951, Gray 1990).
A spatially variable entropy can increase the quality of a maximum-entropy
reconstruction.Equation40hasbeenusedtointroducereferenceimagesofvarious
types (Gull 1989; Charter 1990; Weir 1992; Bontekoe, Koper & Kester 1994;
Bridle et al. 1998; Jones et al. 1998, 1999; Marshall et al. 2002; Strong 2003). A
commercial version called MEMSYS is available from Maximum Entropy Data
Consultants Ltd. and is compared with other image reconstructions in Section
8.7. Variations of Equation 41 have been applied in gravitational lensing (Seitz,
Schneider & Bartelmann 1998) and medical imaging (Byrne 1993, 1998, and
references therein). Additional applications have been used in conjunction with
wavelets (Section 8.2).
S(I) =
j
[Ij− Jj− Ijln(Ij/Jj)].
(40)
S(I) = −
j
Ijln(Ij/Jj),
(41)
8.2. Wavelets
We saw in Section 3.4 that wavelets can provide spatially adaptive denoising of
data prior to noniterative deconvolution. The reader is referred to that section for
a discussion of the motivation for wavelet filtering and its characteristics. Most of
the astronomical applications, however, have actually used iterative methods, in
which wavelet filtering is applied repeatedly during the iterations (see the review
byStarck,Pantin&Murtagh2002).Thebasicideaistofiltertheresidualsbetween
iterations, setting the insignificant ones to zero, and leaving only significant struc-
tures. The decision as to which wavelets are significant and which are not can be
made initially (Starck & Murtagh 1994; Starck, Murtagh & Gastaud 1998) or can
be updated in each iteration (Murtagh, Starck & Bijaoui 1995; Starck, Murtagh &
Bijaoui1995).Wavelet-baseddenoisingcanbeappliedincombinationwithClean
or the van Cittert or Richardson-Lucy methods (Wakker & Schwarz 1988; Starck,
Pantin & Murtagh 2002).
Waveletscanalsobeusedinthemaximum-entropymethod,writingtheentropy
in terms of wavelet coefficients instead of pixel values (Pantin & Starck 1996;
Starck, Murtagh & Gastaud 1998; Starck & Murtagh 1999; Starck et al. 2001;
Figueiredo & Nowak 2003; Maisinger, Hobson & Lasenby 2004). This has been
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 40
178
PUETTER ? GOSNELL ? YAHIL
applied to the analysis of the cosmic microwave background radiation (Hobson,
Jones & Lasenby 1999; Sanz et al. 1999; Tenorio et al. 1999).
A disadvantage of images defined by wavelet basis functions is that the ba-
sis functions have both positive and negative image values. The negative values
therefore must be filled in by other basis functions, if the image is to be nonneg-
ative. This is less efficient, i.e., requires more basis functions, than representing a
nonnegative image by nonnegative basis functions. We argue in Section 8.5 that
minimum complexity, i.e., minimizing the number of basis functions used to char-
acterize the image, is the key to spatially adaptive image restriction. Wavelet basis
functions are at a disadvantage in this respect, because of the complicated way in
which they enforce nonnegativity.
8.3. Markov Random Fields and Gibbs Priors
Recall that the maximum-entropy prior (Section 7.5) assumes that the pixels are
statistically independent; the combined prior of all the pixels is the product of
the priors of the individual pixels. Equivalently, the total entropy of all the pixels
is the sum of the entropies of the individual pixels. This continues to hold even
when a reference image is introduced (Section 8.1). Pixels have varying priors
but continue to be statistically independent. An alternative is to introduce pixel
correlations into the prior, i.e., the probability of an image value at a pixel would
depend on the image values in neighboring pixels. These conditional probabilities
are called Markov random fields.
ThestartingpointofaMarkovrandomfieldisaneighborhoodsystemthatiden-
tifies for each pixel j a neighborhood Cj, called its clique, such that the probability
of obtaining Ijis simply a conditional probability on the image values in Cj. The
prior can be written in the form of an exponential of a potential function V of the
clique members (Besag 1974, 1986; Geman & Geman 1984):
p(I) ∝ exp
?
−
?
j
?
k∈Cj
VCj(Ik)
?
.
(42)
This form is reminiscent of the Gibbs function of statistical physics (which de-
scribes the interactions between particles), so the prior in Equation 42 is called a
Gibbs prior.
The point for image reconstruction is that the potential terms V in Equation 42
depend on the cliques, which introduce spatial correlations. The main application
so far has been in medical imaging (Shepp & Vardi 1982, Hebert & Leahy 1989,
Green1990)inwhichthegoalistodelineatemoreclearlybodyorgansbylocating
different cliques inside and outside the organs. Normally the organs are delineated
in advance, perhaps with the aid of other images (Gindi et al. 1991). Adaptive
delineation is also possible (Figueiredo & Leitao 1994, Higdon et al. 1997). The
method holds promise for pattern recognition in general (Lu & Jiang 2001).
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 41
DIGITAL IMAGE RECONSTRUCTION
179
8.4. Atomic Priors and Massive Inference
The entropy prior (Section 7.5), even in its spatially variable form (Section 8.1),
suffers from two inherent theoretical difficulties. First, the prior of a sum of two
images, e.g., the sum of two polarization states, is not the convolution of the
priors of the two images, as probability theory requires. Second, the prior does not
converge to a continuum limit as the image pixels become infinitesimally small. In
order to overcome these difficulties, Sibisi & Skilling (1997) and Skilling (1998,
2003) propose to construct the image from “atoms” scattered randomly over the
field of view, each carrying a flux, which itself is a Poisson random variable. The
positions of the atoms are kept to machine precision, so pixelation is not an issue,
and the Poisson distribution of the individual atomic fluxes guarantees that the
prior of a sum of fluxes is correctly given by the convolution of the priors of the
individual fluxes.
The difficulty of the scheme is to find the positions and fluxes of the atoms.
This is done by a Markov chain Monte Carlo simulation. A few atoms are first
injected randomly. Additional atoms are then sampled from the Markov random
field, and the process is repeated until a smooth image is obtained. The method,
called massive inference, is therefore spatially adaptive by construction and can
be quite powerful. It is also very computationally intensive. The only published
applications that we have been able to find are to 1D time series and spectra
(Skilling 1998; Ochs et al. 1999; Ebbels, Lindon & Nicholson 2001).
8.5. Ockham’s Razor and Minimum Complexity
Is there a general principle that can guide us in designing image restriction? So
far we have considered several specific methods. Some are parametric fits that
specify explicit functional forms. Others are nonparametric methods that restrict
the image model in one way or another, either globally or with a degree of spatial
adaptation. A common characteristic of all these methods is that they establish
correlations between image values at different locations. The character of these
correlations depends on the reconstruction method, but a common thread is that
image restriction and image correlations go hand in hand. The stronger the image
restriction,thestrongerarethecorrelations,i.e.,theyextendoverlargerseparations.
A restatement of the image reconstruction problem might therefore be: “Find the
most strongly correlated image that fits the data.” The trick comes in designing
imagerestrictiontoexpressthecorrectkindofcorrelationswhileremaininggeneral
enough to include all possible images that may be encountered.
Another way of considering the problem is in terms of the information content
of the data. As we know, the data consist of reproducible information in the form
of signal due to an underlying image and irreproducible information due to noise.
We are only interested in the reproducible information, which is what we really
mean by information content. The most conservative and reliable way to divide
the information into its reproducible and irreproducible parts is to maximize the
information associated with the noise and to minimize the signal information, i.e.,
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 42
180
PUETTER ? GOSNELL ? YAHIL
to look for the least informative characterization of the image. It follows that,
if information content can be measured by an entropy, then we should strive to
maximize the entropy of the noise, i.e., the residuals, and not the entropy of the
image, as is done by the maximum-entropy method (Section 7.5).
The idea of seeking the minimalist explanation of observed data goes back
to the English theologian William of Ockham (ca. 1285–1349), who advocated
parsimony of postulates, stating, “Pluralitas non est ponenda sine necessitate,”
which translates as, “Plurality should not be posited without necessity.” This prin-
ciple, known today as Ockham’s razor, has become a cornerstone of the scientific
method.
It is straightforward to apply Ockham’s razor to parametric fits: One accepts a
parameteronlyifitisstatisticallysignificant,therebyrestrictingthenumberofpa-
rameters to the minimum required by the data. That is common scientific practice,
andiswhattheCleanmethoddoesintheareaofimagereconstruction(Section5.3).
It is more difficult to apply Ockham’s razor to nonparametric methods, because it
is not clear what “parsimony of postulates” means in that case. There have been
attemptstodefinecomplexitybyequatingitwiththealgorithmicinformationcon-
tent, the length of the program needed by a “universal computer” (Turing 1936,
1937) to print the reproducible information content and stop (Solomonoff 1964,
Kolmogorov 1965, Chaitin 1966). Unfortunately, defined in such a general and
abstract way, it is not possible to find the minimum complexity in any manageable
time, because the set of possible models (images in our case) is combinatorially
large. In computer science, the problem is said to be NP-hard (Cook 1971), i.e., it
isintrinsicallyharderthanthosethatcanbesolvedbyanondeterministicuniversal
computer in polynomial time in the size of the problem.
To avoid the abstract generality and the combinatorial explosion of possible
image models, we need to restrict the set of image models from which we seek
the minimally complex solution. Another way of saying this is that we need an
image language to describe image information content. A parametric model is an
extremely specific way to characterize image information content but it is also
very restrictive. At the other extreme, a nonparametric representation of an image
by means of independent values at each point of a large grid is too loose, and only
serves to introduce artifacts. We need to design an intermediate language that can
describe the image in terms of the shapes, sizes, and positions of image structures,
so we can describe complex images more compactly with a minimal number of
components. In analogy with our daily use of language, we need a rich vocabulary
that embraces all the possible information that we may wish to impart, and then
use the minimum number of words required in any instance.
8.6. Full Pixon
Smoothingreducesthenumberofimagecomponents.Theeasiestwaytominimize
image complexity is therefore to require the maximum, spatially adaptive image
smoothness permitted by the data. This is the approach taken by the Pixon method
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 43
DIGITAL IMAGE RECONSTRUCTION
181
(Pi˜ na & Puetter 1993, Puetter & Yahil 1999). Given a trial nonnegative image
φ, called a pseudoimage, consider the image obtained by smoothing it with a
nonnegative, spatially variable kernel K:
?
The goal of Pixon image reconstruction is to find the smoothest possible image by
selecting the broadest possible nonnegative K for which a nonnegative φ can be
found, such that the image given by Equation 43 fits the data adequately. It does
so by optimizing K and φ in turn.
It is straightforward to find φ given K. Expressing the data model in terms of
the pseudoimage, we have:
Ij=
k
Kjkφk= (K ⊗ φ)j.
(43)
M = H ⊗ I = H ⊗ (K ⊗ φ) = (H ⊗ K) ⊗ φ.
So, one simply replaces H by H ⊗ K and solves for φ using a nonnegative least-
squares fit with a stopping criterion (Section 6.2). Then I is given by Equation 43.
This is reminiscent of the method of sieves (Section 6.5). The difference is that
the Pixon method allows Kjkto vary from one grid position k to the next.
To determine K, the Pixon method uses a finite set of kernels, called Pixon
kernels, which are rich enough to allow all images of interest to be expressed
in the form of Equation 43, but not too extensive to include unwanted images.
The design of the Pixon kernels depends on the type of images at hand. For most
applications, a set of circularly (spherically) symmetric kernels, whose widths
form a geometric series, work very well. The exact functional form of the kernels
is not too important. The important point is that the Pixon kernels should span the
sizes and general shapes of the expected image features, so that a pseudoimage
can be smoothed with the Pixon kernels and yield those features.
A set of trial images is constructed by convolving φ separately with each of
the kernels in turn. The final image is then obtained by selecting for each grid
point the trial image from which the image value is taken. The aim is to select
at each grid point the trial image with the broadest kernel function while still
fitting the data adequately. The image made of the indices of the trial images
selected at each grid point is called the Pixon map. Details of the determination of
the Pixon map may vary from application to application. In any event, the Pixon
map should be smooth on the same scales used to smooth the pseudoimage, and
this prevents discontinuities in the final image. A further refinement is to allow
“intermediate kernels” by interpolating between the trial images. This smoothes
the image further and/or allows the use of fewer kernel functions. The geometric
spacing of the widths is designed for optimal characterization of multiscale image
structures.
A Pixon image reconstruction thus proceeds alternately between determining
φ and K. The starting point is a determination of φ with some initial K. For
example, the initial K might be a delta function, in which case the first image is
the nonnegative least-squares solution. Another possibility is to start the fit with
(44)
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 44
182
PUETTER ? GOSNELL ? YAHIL
kernels that are deliberately too broad—resulting in a poor fit to the data—and
to reduce the kernel widths gradually during the iterations until the data are fit
satisfactorily, a process called annealing. For most images, Pixon reconstruction
canproceedinatotalofonlythreesteps:(a)findanonnegativeleast-squaresimage
(delta-function K), (b) determine the Pixon map for the nonnegative least-squares
image, and (c) update the pseudoimage using the Pixon map just determined.
Annealing is used for images with a wide spectrum of features on all scales, so the
large scales are fit first followed by smaller scales.
It is important to emphasize that, because the Pixon method deliberately seeks
to find the smoothest image, it characterizes image features in the broadest pos-
sible way. This bias is deliberate and is intended to prevent narrow artifacts from
masquerading as real sources. Sometimes, however, external information tells us
that some sources are narrow. Then we must change the language used to describe
the image, i.e., we must select different kernels, eliminating broad ones. In the
limit of a field of point sources, the Pixon method becomes a Clean reconstruction
(Section 5.3), using the signal-to-noise ratio to eliminate weak sources. If there
is diffuse emission in addition to the point sources, we need to restore the broad
kernels, but it may be possible to eliminate intermediate-size kernels if it is known
that the diffuse emission is smooth enough.
Pixon image reconstruction has been applied in astronomy (e.g., Metcalf et al.
1996; Dixon et al. 1997; Figer et al. 1999; Gehrz et al. 2001; Young, Puetter &
Yahil 2004), microscopy (Kirkmeyer et al. 2003, Shibata et al. 2004), medical
imaging (Vija et al. 2005, Wesolowski et al. 2005), and defense and security.
8.7. Discussion
Before concluding, we undertake a more comprehensive discussion of the various
merits and shortcomings of the principal image reconstruction concepts presented
in this review. Until now, the bulk of our discussion has focused on theoretical
accounts for how a certain approach might improve over the results obtained with
another; here we present supporting examples to illustrate how the methods might
compare in practice.
It is, however, very difficult for a single practitioner to make absolutely fair
comparisons between the various methods. One simple reason is that different
techniques may be appropriate for different data sets. More problematic is the
fact that different researchers acquire different competencies with the various
methods—especially as regards the more technically complex ones—so that a
purported superiority may reflect only the skill level or biases of the user. Histor-
ically, the fairest comparisons of different image reconstruction techniques have
issuedfromorganized“shootouts,”inwhichexpertsinthedifferentmethodologies
process the same raw data and then mutually evaluate the results. Unfortunately,
producing such events requires considerable effort and is undertaken only rarely.
To our knowledge, Bontekoe et al. (1991) organized the only shootout among
some of the major methods discussed in this review. Like most authors, we have
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 45
DIGITAL IMAGE RECONSTRUCTION
183
not made such an effort and refrain from attempting definitive comparisons. We
do, however, take advantage of our expertise in the use of the Pixon method to
underscore what we believe to be some of the most important emerging issues in
high-performance image processing.
Our first comparison appears in Figure 3, in which we reconsider the image of
New York City shown in Figure 2. Reproduced are the Wiener reconstruction with
β = 0.1 and a nonnegative least-squares reconstruction carried to convergence,
together with the truth image and the blurred and noisy data. Both reconstructions
appear to have good residuals, but are chock-full of artifacts. (The artifact level is
somewhatmoresevereforthenonnegativeleast-squaressolution.)Infact,bothre-
constructionsoverfitthedata,withχ2/nvaluesof0.88and0.76,fortheWienerand
nonnegative least-squares reconstructions, respectively. (The differences between
these values and the expected value of unity are significant at the 40σ and 80σ
levels, respectively.) Figure 3 reemphasizes the danger of overfitting the data. It is
hard to see this in the residual plots in Figure 3, but spectral analysis of the resid-
uals shows that the reconstructions primarily overfit the low-k data components,
whereasthehigh-kcomponentsaretreatedasnoiseandarenotreconstructed.The
reconstructions amplify the artifacts because of overfitting more strongly with in-
creasingk(Section2.8).Theresultisthattheimagesaredominatedbytheartifacts
with the highest overfitted k, making the artifacts particularly noticeable.
Figure 4 presents a nonnegative least-squares reconstruction of the same New
York City image shown in Figures 2 and 3, this time with early termination at
χ2= n +√2n. Also shown are the quick Pixon reconstruction already displayed
in Figure 2 and a new full Pixon reconstruction. The nonnegative least-squares
solutionshowsbetterresidualsthanthequickPixonreconstructionbutsignificantly
stronger artifacts. The full Pixon reconstruction, by contrast, shows both better
residuals and appears to be artifact free.
All three reconstructions have reasonable χ2, but that is only a single global
measure, which can hide variations across the image. The artifacts of the nonnega-
tiveleast-squaresfitshowthatitoverfitsthedatainpartsoftheimageandunderfits
theminotherparts.ThequickPixonreconstructiondoesabetterjobavoidingarti-
factsbutitdoesnotfitthedatawellenough,andsignalstructurescanbeseeninthe
residuals.OnlythefullPixonreconstructionmanagestofitthedatawellenoughto
leave reasonable residuals while avoiding artifacts. (The residuals are not perfect,
and the reconstruction might perhaps benefit by adding more kernel functions.)
An illustration of image reconstruction in the astronomical arena is presented
in Figure 5. The raw data, in the form of 60 µm scans collected by the Infrared
Astronomical Satellite (IRAS), were corrected for nonuniformity and coadded be-
fore being presented to leading experts in a number of reconstruction techniques
(Bontekoe et al. 1991). The point-response function was also provided. The coad-
ded data are shown in (b) with the point-response function on the same scale in
an insert. The competing reconstructions are shown as contour diagrams on the
right: (c) the high-resolution (HIRES) method of NASA’s Infrared Processing and
Analysis Center, based on the maximum-correlation method (Rice 1993), (d) the
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 46
184
PUETTER ? GOSNELL ? YAHIL
Figure 4
Variety of iterative image reconstructions: (a) stopped nonnegative least-squares fit with (b) residuals,
(c) quick Pixon with (d) residuals, and (e) full Pixon with (f) residuals. The data and the “truth” image are shown
in Figure 3.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 47
DIGITAL IMAGE RECONSTRUCTION
185
Figure5
taken by the Infrared Astronomical Satellite (Bontekoe et al. 1991): (a) false color image
of the Pixon reconstruction in (f) overlaid with 5 GHz radio continuum contours (van der
Hulst et al. 1988), (b) coadded input data with the point-response function on the same scale
in an insert, (c) NASA high-resolution reconstruction, (d) Richardson-Lucy reconstruction,
(e) maximum-entropy reconstruction, and (f) Pixon reconstruction. Objects identified in op-
ticalimagesarealsomarkedin(a):(Opt)stars,(Ha)Hα emissionknots.Theblackpatchesin
(e) and (f) represent zero intensity. The scales of panels (a), (b) and (c)–(f) are unequal and
in the ratios 1.0 : 0.18 : 0.28.
Varietyofimagereconstructionsof60µmscansofthegalaxypairM51/NGC5195
Richardson-Lucy method, (e) a commercial version of the maximum-entropy
method (MEMSYS 3), and (f) the Pixon method. That the fine features appearing
in the Pixon reconstruction actually exist is demonstrated in (a), where we plot
contours of radio continuum intensity on top of the Pixon image. Also shown are
featuresvisibleinopticalimages.Notethatthescalesofpanels(a),(b)and(c)–(f)
are unequal and in the ratios 1.0 : 0.18 : 0.28.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 48
186
PUETTER ? GOSNELL ? YAHIL
The Richardson-Lucy and HIRES reconstructions clearly fail to recover much
more than the gross shape of the galaxy pair. In particular, they fail to recover
the hole in the galactic emission that appears just north of the nucleus (solid black
portionsseeninthePixonandmaximum-entropyimages).Themaximum-entropy
reconstructionbeginstoresolvestructureinthegalaxy’sspiralarms,butthePixon
result is clearly superior. It actually recovers sources 200 times fainter than those
visible in the maximum-entropy reconstruction. The linear spatial resolution is
also better by a factor of ∼3. The stark difference between the Pixon and other
reconstructions can be attributed to the maximal, spatially adaptive smoothing,
which protects the reconstruction from getting lost in artifacts.
Figure 6 shows direct external validation of Pixon processing of 12 µm data
taken by IRAS. Panel (a) shows a collage of scans kindly prepared by Dr. Romano
at the Aerospace Corporation. For each point in each scan the collage includes the
scan flux at the pixel corresponding to the center of the beam at the time the flux
was measured. (The average flux is used when several scans overlap on the same
pixel.) The main sources are all point-like stars (there is also some diffuse back-
ground emission), yet they are spread significantly by the point-response function,
particularlyinthecross-scandirection.Panel(b)showsaHIRESreconstructionof
the image from the original scans (not the collage) performed at NASA’s Infrared
Processing and Analysis Center. Many stars are visible, but the point-response
functionhasclearlynotbeenoptimallydeconvolved,andblurpersistsinthecross-
scan direction. Panel (c) shows the Pixon reconstruction performed on the collage
in (a), which reveals many more sources than HIRES and little residual cross-scan
blur. Finally, panel (d) shows an image of the same area of the sky, taken 12 years
later by the Midcourse Space Experiment (MSX) satellite of the U.S. Air Force
with a much improved imaging system, validating the Pixon reconstruction.
Finally, Figure 7 shows an example from nuclear medicine (Vija et al. 2005):
a phantom with numerous dowels of varying diameters and heights containing
radioactive material emitting gamma rays. The goal is to image the smallest pos-
sible dowels containing the least amount of radioactive material. The top panels
show the raw counts, whereas the bottom panels show the results of the Pixon re-
constructions. The acquisition times are varied to yield total counts ranging from
0.2 megacounts in the left panels to 0.8 megacounts in the middle panels to 6.4
megacounts in the right panels. This provides a range of signal-to-noise ratios,
whereby the tradeoff between image fidelity and acquisition time or dose can be
assessed. For these planar scintigraphic images, the blur is insignificant compared
with the Poisson counting noise, i.e., the Pixon kernels used to smooth the image
are everywhere wider than the point-response function, so only adaptive smooth-
ing is performed and no deblurring is attempted. Parameters controlling Pixon
processing are kept fixed for all acquisitions to strain the test of how adaptive
and data driven the method is. A visual comparison of the Pixon reconstruction
in panel (b) and the raw counts in panel (c) shows that the Pixon images improve
image quality in a way that can be achieved by the raw images only by increasing
the counts by an order of magnitude.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 49
DIGITAL IMAGE RECONSTRUCTION
187
Figure 6
scanstakenbytheInfraredAstronomicalSatellite:(a)collageofthescans,(b)high-resolution
reconstructionbyNASA’sInfraredProcessingandAnalysisCenter,(c)Pixonreconstruction,
and (d) an image obtained 12 years later by the Midcourse Space Experiment satellite of the
U.S. Air Force and processed by the Space Dynamics Laboratory (Logan, UT).
Externally validated comparison of Pixon and NASA reconstructions of 12 µm
9. SUMMARY
The past few decades have seen the evolution of two unmistakable trends in high-
performance image processing: (a) techniques for image restriction are essential
for solving the inverse problem of image reconstruction, and (b) performance is
significantly improved when the image-restriction strategy is allowed to adapt to
varying conditions across the image.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Page 50
188
PUETTER ? GOSNELL ? YAHIL
Figure 7
Order-of-magnitude noise suppression of planar scintigraphic γ-ray images of medical phantoms by
Pixonreconstructions:(a)0.2megacountsdatawith(b)Pixonimage,(c)0.8megacountsdatawith(d)Pixonimage,
and (e) 6.4 megacounts data with (f) Pixon image.
Annu. Rev. Astro. Astrophys. 2005.43:139-194. Downloaded from arjournals.annualreviews.org
by University of California - San Diego on 09/03/05. For personal use only.
Download full-text