Page 1

36 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

Generalizing the Nonlocal-Means to

Super-Resolution Reconstruction

Matan Protter, Michael Elad, Senior Member, IEEE, Hiroyuki Takeda, Student Member, IEEE, and

Peyman Milanfar, Senior Member, IEEE

Abstract—Super-resolution reconstruction proposes a fusion

of several low-quality images into one higher quality result with

better optical resolution. Classic super-resolution techniques

strongly rely on the availability of accurate motion estimation

for this fusion task. When the motion is estimated inaccurately,

as often happens for nonglobal motion fields, annoying artifacts

appear in the super-resolved outcome. Encouraged by recent de-

velopments on the video denoising problem, where state-of-the-art

algorithms are formed with no explicit motion estimation, we

seek a super-resolution algorithm of similar nature that will allow

processing sequences with general motion patterns. In this paper,

we base our solution on the Nonlocal-Means (NLM) algorithm.

We show how this denoising method is generalized to become

a relatively simple super-resolution algorithm with no explicit

motion estimation. Results on several test movies show that the

proposed method is very successful in providing super-resolution

on general sequences.

Index Terms—Nonlocal-means, probabilistic motion estimation,

super-resolution.

I. INTRODUCTION

S

with better optical resolution. This is an Inverse Problem that

combines denoising,deblurring,and scaling-up tasks, aimingto

recovera high quality signal from degradedversionsof it.Fig.1

presents the process that explains how a low-resolution image

sequence

is related to an original higher resolution movie

. During the imaging, the scene may become blurred due to

atmospheric, lens, or sensors’ effects. The blur is denoted by

, assumed for simplicity to be linear space and time invariant.

Similarly, the loss of spatial resolution due to the sensor array

sampling is modeled by the fixed decimation operator

resenting the resolution factor

and the measured one. White Gaussian iid noise is assumed to

UPER-RESOLUTIONreconstructionproposesafusionof

several low quality images into one higher quality result

, rep-

between the original sequence,

Manuscript received December 23, 2007; revised April 23, 2008. First pub-

lished December 2, 2008; current version published December 12, 2008. This

workwassupportedinpartbytheUnitedStates-IsraelBinationalScienceFoun-

dation Grant 2004199 and in part by the U.S. Air Force Office of Scientific Re-

search Grant FA9550-07-1-0365. The associate editor coordinating the review

of this manuscript and approving it for publication was Dr. Pier Luigi Dragotti.

M. Protter and M. Elad are with the Department of Computer Science,

The Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail:

matanpr@cs.technion.ac.il; elad@cs.technion.ac.il).

H. TakedaandP. Milanfararewith the Departmentof ElectricalEngineering,

University of California Santa-Cruz, Santa-Cruz, CA 95064 USA (e-mail:

htakeda@soe.ucsc.edu; milanfar@soe.ucsc.edu).

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2008.2008067

Fig. 1. Imaging process to be reversed by super resolution.

be added to the measurements, both in order to refer to actual

noise in imaging systems, as well as for accommodating model

mismatches.

The super resolution goal is the recovery of

set of images

, reversing the above process. Such reconstruc-

tion relies on motion in the scene to recover details that are

finer than the sampling grid. Fig. 2 demonstrates how small de-

tails can be recovered when the motion between the images in

the sequence is known with a high degree of accuracy. The top

row in the figure is the input sequence. The middle row is the

up-scaled version of each image (unknown values are set to a

background color), shifted by the known translation between

the current image and the first (reference) image. The bottom

row shows the construction of the super resolution image, from

left to right. Initially, the first image is placed on the grid. Then,

everynewimageinthesequenceisplacedonthesamegrid,with

a displacement reflecting the motion it underwent. The merger

of all images represents the outcome of the super resolution al-

gorithm.Wenotethatthisdescriptionofthemechanicsofsuper-

resolution is somewhat simplistic; In most cases, one cannot as-

sume the translations to be exact multiples of the high-resolu-

tion pixel sizes. This makes the estimation of accurate motion

parameters and the merger of all images much more complex

than described here.

While the above-described method is somewhat simplistic, it

is a faithful description of the foundations for all classic super

resolution algorithms. The first step of such algorithms is an

estimation of the motion in the sequence, followed by a fusion

of the inputs according to these motion vectors. A wide variety

of super resolution algorithms have been developed in the past

two decades; we refer to [1]–[26] as representatives of this vast

literature.

In the currently available super-resolution algorithms, only

global motion estimation (e.g., translation or affine global

warp) is accurate enough to lead to a successful reconstruction

of a super-resolved image. This is very limiting, as most actual

from the input

1057-7149/$25.00 © 2008 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on December 18, 2008 at 18:02 from IEEE Xplore. Restrictions apply.

Page 2

Report Documentation Page

Form Approved

OMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and

maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,

including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington

VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it

does not display a currently valid OMB control number.

1. REPORT DATE

23 APR 2008

2. REPORT TYPE

3. DATES COVERED

00-00-2008 to 00-00-2008

4. TITLE AND SUBTITLE

Generalizing the Nonlocal-Means to Super-Resolution Reconstruction

5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S)

5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)

University of California, Santa Cruz,Electrical Engineering

Department,1156 High Street,Santa Cruz,CA,95064

8. PERFORMING ORGANIZATION

REPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)

10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT

NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT

Approved for public release; distribution unlimited

13. SUPPLEMENTARY NOTES

14. ABSTRACT

see report

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF:

17. LIMITATION OF

ABSTRACT

Same as

Report (SAR)

18. NUMBER

OF PAGES

16

19a. NAME OF

RESPONSIBLE PERSON

a. REPORT

unclassified

b. ABSTRACT

unclassified

c. THIS PAGE

unclassified

Standard Form 298 (Rev. 8-98)

Prescribed by ANSI Std Z39-18

Page 3

PROTTER et al.: GENERALIZING THE NONLOCAL-MEANS TO SUPER-RESOLUTION RECONSTRUCTION37

Fig.2. Superresolving animageusinglowresolution inputswithknowntrans-

lations. Reconstruction proceeds from left to right. Top: input images; middle:

corresponding up-scaled images, shifted using known translations; bottom: ac-

cumulated reconstruction by adding the current low resolution image to the

output canvas.

scenes contain motion that is local in its nature (e.g., a person

talking). Obtaining highly accurate local motion estimation,

known as optical flow, is a very difficult task, and particularly

so in the presence of aliasing and noise. When inaccurately

estimated motion is used within one of the existing recon-

struction algorithms, it often leads to disturbing artifacts that

cause the output to be inferior, even when compared to the

given measurements. This discussion leads to the commonly

agreed unavoidable conclusion that general content movies

are not likely to be handled well by classical super-resolution

techniques.

This severerestriction leads us to seek a different approach to

super-resolution.Cansuchanalgorithmbeproposedwithnoex-

plicitmotionestimation?Ourstartingpointforthisquest(aftera

super-resolutionalgorithmthatisabletoprocesssequenceswith

a general motion pattern) is the video denoising application,

where several recent contributions demonstrate state-of-the-art

results with algorithms that avoid motion estimation [27]–[30].

Among these, we choose to take a closer look at the Nonlocal

Means (NLM) algorithm, with the aim to generalize it to per-

form super-resolution reconstruction.

The NLM is the weakest among the recent motion-estima-

tion-free video denoising algorithms, and yet, it is also the sim-

plest. As such, it stands as a good candidate for generaliza-

tion. The NLM is posed originally in [31] as a single image

denoising method, generalizing the well-known bilateral filter

[32], [33]. Denoising is obtained by replacing every pixel with

a weighted average of its neighborhood. The weights for this

computation are evaluated by using block-matching fit between

imagepatchescenteredaroundthecenterpixeltobefiltered,and

the neighbor pixels to be averaged. Recent work has shown how

this method can be used for video denoising by extending the

very same technique to 3-D neighborhoods [27]. An improve-

ment of this technique, considering varying size neighborhoods

is suggested in [28], so as to trade bias versus variance in an at-

tempt to get the best mean-squared-error (MSE).

The NLM was proposed intuitively in [27] and [31], and,

thus, it is natural to try to extend it to perform super-resolu-

tionusingasimilarintuition.Thisintuitionleadstoindependent

up-scaling of each image in the sequence using a smart interpo-

lation method, followed by NLM processing. However, exten-

siveexperimentsindicatethatthisintuitivemethoddoesnotpro-

vide super-resolution results. For this reason, a more profound

understanding of the NLM filter is required for its successful

generalization to super-resolution.

In order to gain a better understanding of the NLM, we pro-

pose redefining it as an energy minimization task. We show that

thenovelpenaltytermweproposeindeedleadswhenminimized

to the NLM. We then carefully extend the penalty function to

the super-resolution problem. We show how a tractable algo-

rithm emerges from the minimization of this penalty function,

leadingtoalocal,patch-based,super-resolutionprocesswithno

explicit motion estimation. Empirical tests of the derived algo-

rithm on actual sequences with general motion patterns are then

presented, thus demonstrating the capabilities of the derived al-

gorithm.

The structure of the paper is as follows. Section II describes

the NLM denoising filter, as posed in [31]. This section can be

skipped by readers who are familiar with the NLM. Section III

introduces an energy function to be minimized for getting a de-

noising effect for a single image; We show that this minimiza-

tion leads to a family of image denoising algorithms, NLM in-

cludedasaspecialcase.Wealsoprovideasimplerpenaltyfunc-

tion addressing the same goal, which will be effectively used

in the later part of the paper. Section IV proposes a general-

ization of the introduced energy function to cope with resolu-

tion changes, thereby enabling super-resolution reconstruction.

In this section we also derive the eventual super-resolution al-

gorithm we propose, and discuss its numerical structure. Sec-

tion V shows results on sequences with general motion, demon-

strating the successful recovery of high frequencies. We con-

clude in Section VI, outlining the key contribution of this work,

and describing several directions for further research.

II. BILATERAL AND THE NLM DENOISING FILTERS

We begin our journey with a description of the bilateral and

the NLM filters, as the development that follows relies on their

structure. The description given in this section is faithful to the

one foundin [31] and [32].Thebilateral and theNLMfiltersare

two very successful image denoising filters. While not the very

bestindenoisingperformance,thesemethodsareverysimpleto

understand and implement, and this makes them a good starting

point for our needs.

Both the bilateral and the NLM filters are based on the as-

sumptionthatimagecontentislikelytorepeatitselfwithinsome

neighborhood. Therefore, denoising each pixel is done by aver-

aging all pixels in its neighborhood. This averaging is not done

in a blind and uniform way, however. Instead, each of the pixels

in the relevant neighborhood is assigned a weight, that reflects

the probability that this pixel and the pixel to be denoised had

the same value, prior to the additive noise degradation. A for-

mula describing these filters looks like1

(1)

where

and the term

pixel. The input pixels are

location is

The two filters differ in the method by which the weights

are computed. The weights for the bilateral filter are computed

stands for the neighborhood of the pixel

is the weight for the

, and the output result in that

.

,

-th neighbor

1As we shall see next, in this framework the coefficients ?????????? are all

restricted to be positive. This is a shortcoming, which can be overcome by ex-

tending the framework to higher order—see [34].

Authorized licensed use limited to: IEEE Xplore. Downloaded on December 18, 2008 at 18:02 from IEEE Xplore. Restrictions apply.

Page 4

38 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

based both on radiometric (gray-level) proximity and geometric

proximity between the pixels, namely

(2)

The function

as such, it is monotonically nonincreasing. It may take many

forms, such as a Gaussian, a box function, a constant, and more.

The parameter

controls theeffect of the grey-leveldifference

between the two pixels. This way, when the two pixels that are

markedly different, the weight is very small, implying that this

neighbor is not to be trusted in the averaging.

The radiometric part in the weights of the NLM is computed

slightly differently, by computing the Euclidean distance be-

tween two image patches centered around these two involved

pixels. Defining

as an operator that extracts a patch of a

fixed and predetermined size (say

the expression

( is represented as a vector by lexico-

graphic ordering) results with a vector of length

extracted patch. Thus, the NLM weights are given by

takes the geometric distance into account, and

pixels) from an image,

being the

(3)

Obviously, setting

eral filter emerges as a special case of the NLM algorithm.

We note that there are various other ways to choose the

weights in (1), and the above separable choice of the weights

(product of radiometric and Euclidean distance terms) is only

one choice. For example, the steering kernel may provide

an interesting alternative, taking into account the correlation

between the pixel positions and their value [34]. Nevertheless,

in this paper we shall restrict our choice of weights to those

used by the NLM.

to extract only a single pixel, the bilat-

III. NLM VIA ENERGY MINIMIZATION

Both the bilateral and the NLM filters described-above were

presented intuitively as algorithmic formulas, as in (1). We

claim that both these filters can be derived by minimizing a

properly defined penalty function. Following the rationale and

steps taken in [33] and [35], we present such a penalty function,

and show how these algorithms emerge from it. This will

prove valuable when taking the next step of generalizing these

methods to a super-resolution reconstruction algorithm, as will

be shown in Section IV. Sections III-A and III-C present two

possible and novel penalty functions for denoising, and derive

from both the NLM algorithm and some variations of it. The

readers interested in the super-resolution portion of this work

can start their reading in Section III-C.

A. Penalty Function

The penalty function we start with reflects two forces: i) We

desire a proximity between the reconstructed and the input im-

ages—this is the classic likelihood term; and ii) We would like

each patch in the resulting image to resemble other patches in

its vicinity. However, we do not expect such a fit for every pair,

and, thus, we introduce weights to designate which of these

pairs are to behave alike. Putting these two forces together with

proper weighting,2we propose a maximum a posteriori proba-

bility (MAP) penalty of the form

(4)

The first term is the log-likelihood function for a white and

Gaussian noise.Thesecond termstands fora prior, representing

the (minus) log of the probability of an image

weights in the above expression,

confidence that the patches around

close to each other. Computing these weights can be done in a

number of ways, one of which is using (3) and using

of the unknown . In order to keep the discussion simple, from

this point on we shall assume that the weights

the NLM ones. It is important to note that the patch extraction

operator

used for computing the weights (as in (3)) and the

operator

used in the penalty term are generally of different

sizes.

The notation

stands for the support of the entire image.

Thus, the second term sweeps through each and every pixel

in the image, and for each we require a proximity to sur-

rounding patches in its neighborhood.

to exist. The

, are assigning a

andare to be

instead

are

B. Derivation of the NLM Filter

Assuming the weights are predetermined and considered as

constants, we can minimize this penalty term with respect to

by zeroing its derivative

(5)

In order to simplify this equation, we open the brackets

2The reason for the factor 1/4 in the second term will be made clear shortly.

Authorized licensed use limited to: IEEE Xplore. Downloaded on December 18, 2008 at 18:02 from IEEE Xplore. Restrictions apply.

Page 5

PROTTER et al.: GENERALIZING THE NONLOCAL-MEANS TO SUPER-RESOLUTION RECONSTRUCTION39

(6)

We proceed by invoking two assumptions: i) The neighborhood

is symmetric, i.e., if

also true; and ii) The weights are symmetric, i.e.,

. Both these assumptions are natural—typical neigh-

borhood definitions satisfy the first condition, and the weights

of the NLM satisfy the second one. Using these two assump-

tions, we get the following two equalities:

, thenis

A formal proof (Theorem 1) for these equalities is given in Ap-

pendix A. Using these, (6) simplifies and becomes

(7)

As can be seen, the choice 1/4 in the original definition led to a

simpler final outcome.

While solving the above equation directly is possible in prin-

ciple, it requires an inversion of a very large matrix. Instead,

we adopt an iterative approach based on the fixed-point strategy

[36],[37].Denoting

theoutcomeofthepreviousiteration,

and

the desired outcome of the current iteration, we rewrite

(7) with assignments of iteration stage per each instance of the

unknown . The equation we propose is

(8)

which leads to the relation

(9)

Noticethattheterm

being a function of

Intheobtainedequation,theright-hand-side(RHS)createsan

image by manipulating image patches: for each location

in the image, we copy surrounding neighboring patches in loca-

tions

tothecenterposition

generatesascalar,

—we shall denote this as.

,multipliedbytheweights

. Once built, this image is added towith a proper

weight .

The matrix multiplying

positive definite matrix (see Appendix A). This matrix’s only

task is normalization of the weighted average that took place

on the RHS. As this matrix is invertible, the new solution is

obtained by

on the left-hand-side is a diagonal

(10)

When using the fixed-point method, as we did above, every

appearance of the unknown in the equation is assigned with an

iteration number. Among the many possible assignments, one

should seek one that satisfies two important conditions: i) The

computation of

from should be easy; and ii) The ob-

tained iterative formula should lead to convergence. As for the

first requirement, we indeed have an assignment that leads to

a simple iterative step. Convergence of the above algorithm is

guaranteed if the overall operator multiplying

gent, i.e.,

is conver-

(11)

where

sufficiently large , this condition is met. Nevertheless, we do

not worry about convergence, as we will be using the above for

one iteration only, with the initialization of

that the output for the denoising process is

is the spectral radius of . It is easily seen that for

. This means

, obtained by

(12)

Theabovecomputationis donejustasdescribedabove,withthe

obvious substitution of

cent of the way NLM and the bilateral filters operate, and yet it

is different. The obtained algorithm is a more general and more

powerful denoising algorithm than NLM.

In order to see how the NLM emerges from this formulation

as a special case, we shall assume further that the patch extrac-

tion operation we use,

, extracts a single pixel in location

. This change means that

. Thus, in this case we have

. This process is quite reminis-

is in fact the single pixel

Authorized licensed use limited to: IEEE Xplore. Downloaded on December 18, 2008 at 18:02 from IEEE Xplore. Restrictions apply.