Content uploaded by Kaixuan Wei

Author content

All content in this area was uploaded by Kaixuan Wei on Jul 22, 2020

Content may be subject to copyright.

Low-rank Bayesian Tensor Factorization for Hyperspectral Image Denoising

Kaixuan Wei, Ying Fu∗

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

Abstract

In this paper, we present a low-rank Bayesian tensor factorization approach for hyperspectral image (HSI) denoising problem, where

zero-mean white and homogeneous Gaussian additive noise is removed from a given HSI. The approach is based on two intrinsic

properties underlying a HSI, i.e., the global correlation along spectrum (GCS) and nonlocal self-similarity across space (NSS).

We ﬁrst adaptively construct the patch-based tensor representation for the HSI to extract the NSS knowledge while preserving

the property of GCS. Then, we employ the low rank property in this representation to design a hierarchical probabilistic model

based on Bayesian tensor factorization to capture the inherent spatial-spectral correlation of HSI, which can be eﬀectively solved

under the variational Bayesian framework. Furthermore, through incorporating these two procedures in an iterative manner, we

build an eﬀective HSI denoising model to recover HSI from its corruption. This leads to a state-of-the-art denoising performance,

consistently surpassing recently published leading HSI denoising methods in terms of both comprehensive quantitative assessments

and subjective visual quality.

Keywords: Hyperspectral image denoising, full Bayesian CP factorization, nonlocal self-similarity, global correlation along

spectrum, variational Bayesian inference, tensor rank auto determination.

1. Introduction

Hyperspectral image (HSI) is made up of massive contigu-

ous wavebands for each spatial position of real scenes and pro-

vides much richer information about scenes than multiple/RGB

images. It has been widely used for remote sensing, including

mineral identiﬁcation [1, 2], land cover classiﬁcation [3], vege-

tation studies [4], and atmospheric studies [5]. Besides, in the

computer vision ﬁeld, the availability of detailed physical repre-

sentation of HSI has been substantiated to signiﬁcantly enhance

the performance of numerous computer vision tasks, such as

inpainting [6], tracking [7], unmixing [8], super-resolution [9],

and face recognition [10].

However, in real cases, a HSI is always corrupted by noise,

which severely degrades the quality of the imagery, and nega-

tively impacts all subsequent HSI processing tasks aforemen-

tioned. Noise is inevitable during the acquisition, and caused

at diﬀerent stages in both the optics and photodetector [11].

Therefore, HSI denoising plays a vital role in the typical work-

ﬂow of HSI analysis and processing.

From our observations of several state-of-the-art HSI de-

noising methods [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],

we ﬁnd that the essence of successful design of HSI denois-

ing algorithm is to reasonably extract useful prior structures

knowledge underlying a HSI. The most commonly employed

prior structures for HSI recovery including its global correlation

along spectrum (GCS) and nonlocal self-similarity across space

∗Corresponding author

Email address: kaixuan_wei@outlook.com, fuying@bit.edu.cn

(Kaixuan Wei, Ying Fu)

(NSS). More speciﬁc, the GCS prior denotes a huge amount of

redundancy across the spectral dimension. The high correlation

can be observed among images located in adjacent bands of a

HSI generally. And the NSS prior indicates the enhancement

of sparsity can be achieved by grouping similar images frag-

ments (i.e. blocks), which can further improve the performance

of various HSI recovery methods [6, 16, 20].

Since the traditional 2D image denoising is a well stud-

ied yet still active topic, the simplest way of denoising a HSI

is to apply these oﬀ-the-shelf techniques [23, 24, 25, 26, 27]

band by band. However, this kind of coarsely extended meth-

ods ignores the GCS prior completely, which leads to a rela-

tively low-quality result. To address this issue, carefully de-

signed extension of several high performance 2D image denois-

ing methods was proposed recently [12, 15, 14]. One notable

example is a nonlocal transform-domain ﬁlter, generally re-

ferred to BM4D [12], which is a non-trivial but straightforward

extension of well-known image denoising method BM3D [24].

Besides, a sparse representation based reconstruction method

[15] which jointly utilize the global and local redundancy and

correlation in spatial/spectral domain, is inspired by previous

outstanding work 3D cubic K-SVD [23]. Similarly, a spec-

tral/spatial adaptive hyperspectral total variance (SSAHTV) de-

noising algorithm [14], in which the spectral noise diﬀerences

and spatial information diﬀerences are both considered in the

process of noise reduction, is a HSI-oriented variation of the

spatially adaptive TV (SATV) model [28].

Instead of constructing method directly from existed one,

low-rank matrix recovery (LRMR) techniques are employed for

HSI denoising, including convex relaxation based approaches

Preprint submitted to Journal of Neurocomputing March 9, 2020

[29, 30, 31, 32] and Bayesian inference approaches [33, 34, 35,

36, 37]. For example, Zhang et al. [19] directly adopt the Go

Decomposition (GoDec) algorithm [32], which is a optimiza-

tion algorithm aimed to solve LRMR model, to estimate the

low-rank HSI patch. Chen et al. [38] employed a modiﬁed

version of robust principal component analysis (RPCA) algo-

rithm which models noise with mixture of Gaussian in the con-

text of LRMR model, originally proposed in [37] to deal with

HSI with non-i.i.d. noise structure. Recently, He et al. [21]

proposed a local matrix recovery method while considering the

global spatial-spectral smoothness through TV regularization,

which achieves great performance especially in complex noise

removal of hyperspectral remote sensing images.

Albeit these LRMR-based approaches are eﬀective to cer-

tain HSI denoising cases, but they only consider the GCS prior

knowledge. Since the HSI can be naturally represented as a

3D tensor instead of a 2D matrix, one obvious extension of

these LRMR-based approaches is to facilitate the power of ten-

sor decomposition [39, 40] which attracts growing attention re-

cent years. The two popular tensor factorization framework,

named as Tucker and CANDECOMP/PARAFAC (CP), are used

to denoise HSI and permit to appreciate the denoising eﬃciency

respectively[41, 17]. Nevertheless, the input tensor of their

methods is just the original form of the HSI, including two

spatial dimensions and one spectral dimension. This deﬁnitely

degrades their denoising performance, since the two separated

spatial dimensions of the HSI are generally not low-rank rep-

resentable. This intuition can be easily comprehended by Fig-

ure 1, which exhibits typical output bands by directly applying

LRTF method and our integrated model. It can be seen in the

LRTF recovery (top right corner) that some vertical or hori-

zontal artifact textures are introduced due to the unsuitable as-

sumption that the two spatial dimensions of a HSI are low-rank

representable.

Figure 1: Typical output bands by the LRTF based methods, which show it is

unreasonable to directly apply LRTF based methods on HSI denoising.

This property of HSI limits the way to straightly apply the

low-rank tensor factorization (LRTF) based methods. To alle-

viate this problem, one question has been raised naturally: Can

we transfer the initial form of a HSI into a more low-rank rep-

resentable one without destroying the spatial and spectral struc-

ture informations? Fortunately, this issue can be eﬃciently ad-

dressed by a remarkable technique called block matching, orig-

inally introduced by the benchmark BM3D [24] for image de-

noising. There are two approaches to extend the block match-

ing strategy into volumetric data in general. One approach is to

impose the NSS prior on the spectral dimension which further

stacks the cubes of similar voxels into 4D ”groups” respectively

[12]. Another approach, which considers the GCS prior, is to

simply extend a normal 2-D patch to a full band patch (FBP),

then follows the same block matching procedure as the BM3D

[16, 13]. Given the spectral property of a HSI, the FBP based

approach is more eﬃcient than the voxel based approach while

preserving eﬀectiveness.

Once the FBP clusters, which can be viewed as a set of 3D

tensors, have been constructed using the extended block match-

ing strategy, the family of tensor decomposition methods can

be employed to maximize the use of underlying knowledge in-

cluding these aforementioned priors. Xie et al. built a new HSI

denoising model ITS-Reg[13] by applying LRTF based method

on tensors formed by nonlocal similar patches within the HSI.

Zhuang et al. [22] combined the power of LRMR and LRTF

based techniques to construct a global local factorization model

called GLF. However, as the noise level varies drastically, these

convex relaxation based low rank approximation approaches

are prone to overﬁtting due to the incorrect speciﬁed regular-

ization parameters, resulting in severe deterioration of recovery

performance. It is also worth noting that the rank minimization

based on convex optimization of nuclear norm is aﬀected by

the tuning parameters, which may tend to over/underestimate

the HSI.

In this paper, we present a hierarchical probabilistic model

for HSI denoising based on full Bayesian CP tensor factoriza-

tion (LBTF)[42], which can not only ﬁt the underlying noise

adaptively without knowing the speciﬁc noise intensity, but also

determine the tensor rank automatically to address the over-

ﬁtting issues. We ﬁrst adaptively transform the original HSI

into patch-based tensor representations (clusters) to extract the

NSS knowledge while preserving the property of GCS in these

new representations. Then we regard each cluster as a low-rank

noisy observation in our hierarchical probability model in or-

der to obtain the inherent spatial-spectral correlation of HSI.

This model is eﬀectively solved by an elegant deterministic al-

gorithm based on variational inference. The empirical study

demonstrates the superiority of our method, which consistently

outperforms other state-of-the-art HSI denoising methods both

quantitatively and visually.

The rest of this paper is organized as follows. Section 2

presents preliminary multilinear operators and notations. Sec-

tion 3 introduces our Bayesian tensor factorization approach for

HSI denoising. The extensive experimental results on both syn-

thetic and real data are presented in Section 4, followed by con-

clusions in Section 5.

2

2. Notions and Preliminaries

The order of a tensor is the number of dimensions (a.k.a

ways or modes). For clariﬁcation, scalars (zero order tensors)

are denoted by lowercase letters, e.g., a. Vectors (ﬁrst order

tensors) are denoted by boldface lowercase letters, e.g., a. Ma-

trices (second order tensors) are denoted by boldface capital let-

ters, e.g., A. And the k-th row or column vector are denoted by

Ak·,A·krespectively. General tensors (without order constraint)

are denoted by boldface calligraphic letters, e.g., A. Given a

N-th order tensor A∈RI1×···×IN, its (i1,· · · ,×iN)-th entry is

denoted by Ai1···iNwithout boldface.

The denotations of tensor operations follow [40]. The Kro-

necker product of matrices A∈RI×Jand B∈RM×Nis a matrix

of size I M ×JN , denoted by A⊗B. The Khatri-Rao product

of matrices A∈RI×Kand B∈RJ×Kis a matrix of size I J ×K

deﬁned by a columnwise Kronecker product, and denoted by

AB. For convenience, the Khatri-Rao product of a set of

matrices {A(n)|n=1,· · · ,N}in a reverse order is deﬁned by

K

n

A(n)=A(N)A(N−1) · · · A(1) (1)

while the Khatri-Rao product of a set of matrices except the k-th

matrix, is deﬁned by

K

n,k

A(n)=A(N) · · · A(k+1) A(k−1) · · · A(1) (2)

The Hadamard product (a.k.a the Schur product or the en-

trywise product) of two tensors {A,B} ∈ RI1×···×INis denoted

by A◦B. It is a tensor with same dimensions as {A,B}where

each element indexed by i1i2· · · iNis the product of elements

indexed by i1i2· · · iNof the original two tensors.

The inner product of two tensors {A,B}is deﬁned by hA,Bi=

Pi1,··· ,iNAi1,··· ,iNBi1,··· ,iN. For a more general case, we deﬁne

DA(1),· · · ,A(N)E=X

i1,··· ,iNY

n

A(n)

i1,··· ,iN(3)

Our framework for LRTF is based on the CP decomposition

which can be viewed as a higher-order generalization of the

widely used matrix singular value decomposition (SVD). Given

a tensor X∈RI1×···×IN, it can be exactly factorized by a CP

model, giving by

X=

R

X

r=1

A(1)

·r}· · · }A(N)

·r=[[A(1) ,· · · ,A(N)]] (4)

where }denotes the outer product of vectors, [[· · · ]] denotes a

Kruskal operator of a set of matrices having the same number

of columns, A(n)is a mode-nfactor matrix of size In×Rand R

is assumed to be the upper bound the rank of tensor X. Then

each element of the tensor can be described as

Xi1,··· ,iN=

R

X

r=1

A(1)

i1rA(2)

i2r· · · A(N)

iNr(5)

3. The Proposed Method

3.1. Formulation

The HSI can be mathematically viewed as a 3D tensor. Given

the additive noise, the noisy observation Ycan be described as

Y=X+(6)

where Xis the original, unknown clean HSI, and ∼ N(0, τ−1)

is the additive noise. Y∈RH×W×B, where H,W,Bare stand for

the corresponding spatial height, spatial width and the number

of spectral band of this HSI respectively. Xand have the

same dimension with Y. In this paper, we mainly consider the

independent and identically distributed (i.i.d.) Gaussian noise

with unknown noise intensity.

For clariﬁcation and simplicity, we denote the proposed low-

rank Bayesian tensor factorization technique as LBTF, and the

whole HSI denoising algorithm as LBTF-HSI. The objective of

the proposed LBTF-HSI is to provide an estimation ˆ

Xof the

original Xfrom the noisy observation Y, so that ˆ

Xshould be

similar to Xas much as possible under a commonly adopted

error measure (e.g. l2norm).

3.2. Iterative Denoising Framework

The LBTF-HSI is implemented in an iterative denoising

framework motivated by [24], generally consisted of one to

ﬁve near duplicated stages. Each stage comprises three steps:

grouping, low-rank tensor recovery, and aggregation. The group-

ing and aggregation steps are required by the block matching

technique, and the low-rank tensor recovery is what the step we

apply the proposed LBTF algorithm. The ﬂow-diagram of the

LBTF-HSI implementation is illustrated in Figure 2.

In the grouping step, we ﬁrst separate the noisy HSI Yinto

a set of FBPs with overlap. Then for each local FBP, we con-

struct a FBP cluster by performing block matching. To be spe-

ciﬁc, given a reference FBP, the Pnmost similar FBP over all

FBPs will be matched then form together into a FBP cluster,

where Pndenotes the number of patches in a cluster, and the

similarity is measured in a l2norm form for simplicity. Note

this procedure is highly related to the GCS and NSS prior, and

after this operation, both GCS and NSS knowledge are well

preserved and reﬂected by such new representation, along its

spectral and nonlocal-patch-number modes, respectively.

In the recovery step, we ﬁrst initialize two zero-entries ten-

sors denoted by Cand Wwith the same size of the noisy ob-

servation Y, then straightly apply the LBTF technique on each

of FBP clusters to exploit the intrinsic low-rank property un-

derlying this new representation. After we reconstruct all clean

FBP clusters, we map every estimation to Clike the original

form Yusing a cumulative scheme. It also does deserve to be

noticed that Wcan be explained as weight with respect to the

cumulative C, which is obtained during the calculation process

of C.

In the aggregation step, like a inverse Hadamard product,

we simply divide the cumulative Cby its corresponding weight

Welementwise to generate an estimate ˆ

Xof the original X.

To regard this estimate ˆ

Xas a regularization, we can construct

3

Figure 2: Flowchart of the proposed LBTF-HSI method. An noisy HSI is ﬁrstly splitted into a set of overlapping full bands patch, after grouping by each reference

FBP (step 1), each FBP cluster is fed into the LBTF model to acquire its clean estimation (step 2), then each clean estimation is mapped onto the corresponding

locations as it in the original noisy HSI by an accumulative scheme (step 3). Then either follow an iterative regularization to repeat this process or output the

estimated clean HSI as a ﬁnal result.

a updated noisy observation Yby iterative regularization then

repeat the same steps described above. This produces an iter-

ative denoising framework. The whole detained procedures of

this framework are summarized in Algorithm 1.

Algorithm 1 Iterative Denoising Framework

Input: Noisy HSI Y;

1: Initialize ˆ

X(0) =Y;

2: for k=1 : Kdo

3: Iterative regularization Y(k)=ˆ

X(k−1) +δ(Y−ˆ

X(k−1));

4: Construct the entire FBP set Ωk;

5: Group matching FBP clusters {Yi}L

i=1;

6: for Each FBP cluster Yido

7: Recover Xifrom Yiby LBTF;

8: end for

9: Aggregate {Xi}L

i=1to form the clean estimate ˆ

X(l);

10: end for

11: Assign ˆ

X=ˆ

X(K);

Output: Estimation ˆ

Xof X;

3.3. Hierarchical Probabilistic Model

Now we present the LBTF algorithm used in the recovery

step, which is the key component of our method. In this step,

each observation is a noisy FBP cluster formed by block match-

ing. Given such noisy observation, we apply the LBTF to infer

the underlying clean cluster.

From a Bayesian perspective, the CP tensor factorization

can be formulated by a hierarchical probabilistic model which

is actually an instance of probabilistic graphical model (PGM)

[43].

The CP generative model in Equation (4) together with the

observation model in Equation (6) directly gives rise to the fol-

lowing hierarchical probabilistic model. As we discussed in

iterative denoising framework, after we acquire a set of FBP

clusters {Yi}L

i=1, for each cluster Y∈RI1×I2×I3(for brevity, the

subscript iis omitted), its probability density can be derived

through being factorized over tensor elements

pY{A(n)}3

n=1, τ =

I1

Y

i1=1

I2

Y

i2=1

I3

Y

i3=1

NYi1i2i3DA(1)

i1·,A(2)

i2·,A(3)

i3·E, τ−1

(7)

where A(n)is the latent mode-nfactor matrix of size In×R,

we note n=1,2,3 for spatial, spectral, and nonlocal-patch-

number modes respectively. Rt≤Rdenotes the ground-truth

rank of tensor X.N(y|µ, τ−1) denotes a Gaussian distribution

of the form

N(y|µ, τ−1)=(τ

2π)1

2ex p −τ

2(y−µ)2(8)

In order to further build our hierarchical probabilistic model,

we need to enforce a suitable probabilistic structure on the un-

derlying factor matrices {A(n)}3

n=1. From the CP model in Equa-

tion (4), notice that each outer product contributes at most one

to the rank of X. Since a low-rank estimation of Xis sought,

our goal is to achieve column sparsity in A(n), such that most

column in A(n)are set equal to zero. To enforce this constraint,

we associate the columns of A(n)with Gaussian priors of preci-

sions (inverse variances) λr, that is,

pA(n)|λ=Y

in

NA(n)

in·0,Λ−1,∀n∈[1,3](9)

4

where Λ=diag(λ) denotes an inverse variance matrix and is

shared by latent factor matrices in all modes. Thus, the r-th

columns of {A(n)}3

n=1have the same sparsity proﬁle enforced by

the common precisions λr. As shown later, many of the preci-

sions λrwill assume very large values during inference, which

eﬀectively removes the corresponding outer-products from X,

and hence reduces the rank of the estimation. We can further

deﬁne a hyperprior over hyperparameter λ, which is factorized

over latent dimensionality due to the independent assumption

p(λ)=

R

Y

r=1

Gam(λr|cr

0,dr

0) (10)

where Gam(x|a,b) denotes a Gamma distribution of the form

Gam(x|a,b)=baxa−1e−b x

Γ(a)(11)

Γ(·) is the Gamma function.

Using the similar technique, we also place a hyperprior over

the noise precision τ, i.e.

p(τ)=Gam(τ|a0,b0) (12)

Combining Equations (7) and (9) to (12) together, we can

complete our hierarchical probabilistic model as a PGM, the

whole graph representation is illustrated in Figure 3.

For brevity of notations, we denotes all unknowns including

both latent variable and hyperparameters by Θ = {A(n)}3

n=1,λ,τ.

From Figure 3, we can write the the joint distribution of ob-

served data and all model parameters as

p(Y,Θ)=p(YΩ|{A(n)}3

n=1,τ)

3

Y

n=1

p(A(n)|λ)p(λ)p(τ).(13)

The goal turns to infer the posterior of all involved parameters,

which can be done by maximizing Equation (13) without loss

of generality.

ሺଵሻ

ሺଶሻ

ሺଷሻ

߬

ܽ

ܾ

ࣅ

Figure 3: The probabilistic graphical model representation of Bayesian CP ten-

sor factorization.

3.4. Variational Inference

However, in contrast to the point estimation, we aim to

compute the full posterior distribution of all parameters in Θ.

Since that, a deterministic approximate inference method under

the variational Bayesian (VB) framework [43] is developed to

learn the aforementioned hierarchical probabilistic model. To

achieve this goal, we therefore seek a distribution q(Θ) to ap-

proximate the true posterior distribution p(Θ|Y) by solving the

following optimization problem

min

qKL (q(Θ)|| p(Θ|Y))=−Zq(Θ)ln (p(Θ|Y)

q(Θ))dΘ(14)

where KL(q||p) represents the KL divergence between two dis-

tribution qand p. Since the posterior distribution p(Θ|Y) is

computational intractable in our model, it makes our problem

that cannot be reduced from the VB framework into the expec-

tation maximization (EM) framework. Thus, some constraints

need to be imposed on the variational distribution q(Θ) to make

this optimization feasible. Speciﬁcally, it will be assumed that

the variational distribution is factorized w.r.t. each parameter

Θj, so that

q(Θ)=q(λ)q(τ)

3

Y

n=1

qA(n).(15)

This factorized form of variational inference corresponds to an

approximation framework developed in physics called mean

ﬁeld theory [44]. After that, the closed-form optimal solution

q∗

j(Θj) can be obtained by

ln q∗

j(Θj)=hln p(Y,Θ)iΘ\Θj+const (16)

where h·iis a unary operator denoting expectation and Θ\Θj

denotes the set of Θwith Θjremoved. Since the distributions of

all variables are drawn from the distributions over their parent

variables, we can analytically infer the posterior distributions

of model parameters using Equations (13), (15) and (16).

Estimation of mode-n factors A(n).

q∗(A(n))=

In

Y

in=1

NA(n)

in·DA(n)

in·E,Σ(n)

in∀n∈[1,3] (17)

where the posterior parameters can be updated by

DA(n)

in·E=hτiΣ(n)

inDB(\n)TEvec Y·in·(18)

Σ(n)

in=hτiDB(\n)TB(\n)E+hΛi−1(19)

B(\n)=K

k,n

A(k)(20)

The most complex term is related to B(\n), which is of size

Qk,nIk×R, and denotes the Khatri-Rao product of latent factors

in all modes except nth-mode. vec Y·in·denotes the vectorized

FBP cluster of size Qk,nIk, whose mode-n index is in.

Estimation of hyperparameters λ.

q∗(λ)=

R

Y

r=1

Gam(λr|cr,dr) (21)

5

where

cr=cr

0+1

2

3

X

n=1

In(22)

dr=dr

0+1

2

3

X

n=1DA(n)T

·rA(n)

·rE(23)

Estimation of hyperparameter τ.

q∗(τ)=Gam(λr|a,b) (24)

where

a=a0+1

2

3

Y

n=1

In(25)

b=b0+1

2

Y−[[ A(1) ,A(2),A(3)]]

2

F(26)

Algorithm 2 Low-rank Bayesian Tensor Factorization

Input: A FBP cluster Yi;

1: Initialize factor matrices and their covariance A(n)

in·,Σ(n)

in, hy-

perpriors a0,b0,c0,d0and hyperparameters τ=a0

b0, λr=

cr

0

dr

0

;

2: while not converge do

3: for n=1 to 3 do

4: Update the posterior q(A(n)) using Equations (18)

to (20);

5: end for

6: Update the posterior q(λ) using Equations (22) and (23);

7: Update the posterior q(τ) using Equations (25) and (26);

8: Update the estimated Rank R by maxnRank(A(n))

9: end while

Output: Estimate FBP cluster ˆ

Xiand Rank R;

The whole procedure of model inference is summarized in

Algorithm 2, It’s worth noting that tensor rank is determined

automatically and implicitly. To be speciﬁc, during inference,

most of the hyperparameters λiare driven to very large values,

which will force the posterior means of the columns to go to

zero, eﬀectively removing them from the model and reducing

the rank. For implementation of the algorithm, we keep the size

of {A(n)}unchanged during iterations, while an alternative way

is to remove the zero components of {A(n)}after each iteration.

4. Experiment and Analysis

In this section, extensive simulated and real data experi-

ments are conducted to validate the denoising capabilities of

the proposed LBTF-HSI algorithm, and qualitative and visual

results are illustrated. The detailed analysis about our method

is presented in ﬁnal.

Figure 4: Simulated pseudo color images from Columbia Dataset

4.1. Simulated HSI Denoising

Columbia Dataset. The Columbia HSI Dataset [46]1is em-

ployed in our simulated experiment, which is commonly used

in other algorithms veriﬁcation [13, 16]. This dataset consists

of 32 real-world scenes of a wide variety of real-world materi-

als and objects, with spatial resolution 512 ×512 and spectral

bands 31. Each HSI includes full spectral resolution reﬂectance

data collected from 400 nm to 700 nm with 10 nm interval. The

simulated pseudo color images from this dataset are shown in

Figure 4. In our experiments, the intensity of these HSIs is

scaled into [0,1].

Implementation Details. Additive white Gaussian noise (AWGN),

which comes from many natural sources, is added into these

testing HSIs to generate Ycorresponding to our observation

model with noise intensity ranging from 15 to 100 (It’s need to

be clariﬁed we denote the noise intensity with a base 255, i.e.

15 means the standard deviation of Gaussian noise is 15

255 , simi-

larly hereinafter). Unlike other methods, which require speciﬁc

noise intensity as a input parameter, we do not feed this infor-

mation into our method since the internal noise intensity can

be automatically learned during its denoising process. Con-

sequently, except particularly mentioned, we provide the real

noise intensity to comparison methods while our method learns

the noise model automatically.

For parameters setting, we need to care about the initializa-

tion strategy in LBTF (Algorithm 2). There are two parame-

ters which are closely relevant to initialization. One is a binary

parameter which can choose the low-rank components initial-

ization scheme between SVD and random generation (follow a

standard normal distribution). Though the theory of VB frame-

work [43] can guarantee every initialized point converges to a

local minimum, we ﬁnd using random generation rather than

SVD will achieve better performance in the context of HSI de-

noising. This phenomenon can be interpreted by grouping and

aggregation operations involved in our method, which appreci-

ate miscellaneous initialized points rather relatively stable ones.

Another parameter which dominantly aﬀects the denoising ca-

pability of LBTF is the upper bound of rank Rof the low-rank

components. It’s worth noting that we only need to provide a

roughly estimation of the upper bound of objective rank rather

1http://www1.cs.columbia.edu/CAVE/databases/multispectral

6

(a) Clean image

(PSNR, SSIM)

(b) Noisy image

(20.17, 0.19)

(c) BM3D

(34.91, 0.92)

(d) BM4D

(38.61, 0.95)

(e) LRMR

(33.27, 0.72)

(f) LRTV

(29.74, 0.89)

(g) LRTA

(34.53, 0.87)

(h) LLRGTV

(35.35, 0.90)

(i) GLF

(40.29, 0.96)

(j) TDL

(38.07, 0.96)

(k) ITSReg

(39.78, 0.95)

(l) Ours

(40.44,0.97)

Figure 5: The images at band 590 nm of chart and stuﬀed toy under noise level σ=25 on CAVE dataset. Two demarcated areas in each image are ampliﬁed at a 3

times larger scale for easy observation of details.

(a) Clean image

(PSNR, SSIM)

(b) Noisy image

(14.15, 0.11)

(c) BM3D

(28.89, 0.82)

(d) BM4D

(32.16, 0.89)

(e) LRMR

(27.20, 0.56)

(f) LRTV

(26.13, 0.77)

(g) LRTA

(29.63, 0.78)

(h) LLRGTV

(30.72, 0.85)

(i) GLF

(33.80, 0.91)

(j) TDL

(31.79, 0.88)

(k) ITSReg

(33.67, 0.93)

(l) Ours

(34.05,0.93)

Figure 6: The images at band 490 nm of watercolors under noise level σ=50 on CAVE dataset. Two demarcated areas in each image are ampliﬁed at a 6 times

larger scale for easy observation of details.

than the indeed objective rank required by other low-rank based

methods [19, 45]. After one iteration of our algorithm, the truth

rank can be automatically estimated. We simply set Rin the

ﬁrst iteration to 15, and keep track of mean of the truth rank of

all clusters as Rof the next iteration in all of our experiments.

Comparison Methods. The comparison methods include: band-

wise BM3D [24]2, which represents the state-of-the-art for the

2D extended band-wise approach; BM4D [12]2, which repre-

sents state-of-the-arts for the 2D extended 3D-cube-based ap-

proach; LRMR [19], LRTV [45] and LLRGTV [21] which rep-

resent state-of-the-arts for the low-rank matrix-based approach;

LRTA [41], GLF [22], TDL [16]3and ITS-Reg [13]3, which

2http://www.cs.tut.ﬁ/foi/GCF-BM3D/

3http://gr.xjtu.edu.cn/web/dymeng/2

represent state-of-the-arts for the tensor-based approach. All

parameters involved in the competing algorithms were manu-

ally tuned optimally or automatically chosen as described in

the reference papers.

Performance Metrics. To comprehensively assess the perfor-

mance of all competing methods, we employ ﬁve quantitative

picture quality indices (PQI) for performance evaluation, in-

cluding peak signal-to-noise ratio (PSNR), structure similar-

ity (SSIM [47]), feature similarity (FSIM [48]), erreur relative

globale adimensionnelle de synthe‘se (ERGAS [49]) and spec-

tral angle map (SAM [50]). PSNR and SSIM are two conven-

tional PQIs in image processing and computer vision. They

evaluate the similarity between the target image and reference

image based on MSE and structural consistency, respectively.

FSIM emphasizes the perceptual consistency with the reference

7

(a) Clean image

(PSNR, SSIM)

(b) Noisy image

(10.63, 0.02)

(c) BM3D

(31.66, 0.79)

(d) BM4D

(33.59, 0.74)

(e) LRMR

(25.93, 0.32)

(f) LRTV

(28.29, 0.70)

(g) LRTA

(30.99, 0.69)

(h) LLRGTV

(32.59, 0.76)

(i) GLF

(35.39, 0.83)

(j) TDL

(34.16, 0.85)

(k) ITSReg

(34.26, 0.82)

(l) Ours

(36.35,0.89)

Figure 7: The images at band 640 nm of ﬂowers under noise level σ=75 on CAVE dataset. One demarcated areas in each image is ampliﬁed at a 1.5 times larger

scale for easy observation of details.

(a) Clean image

(PSNR, SSIM)

(b) Noisy image

(8.13, 0.04)

(c) BM3D

(23.89, 0.50)

(d) BM4D

(26.23, 0.64)

(e) LRMR

(21.69, 0.40)

(f) LRTV

(22.79, 0.46)

(g) LRTA

(24.09, 0.46)

(h) LLRGTV

(25.31, 0.65)

(i) GLF

(27.59, 0.74)

(j) TDL

(26.08, 0.65)

(k) ITSReg

(26.69, 0.69)

(l) Ours

(27.85,0.75)

Figure 8: The images at band 590 nm of cloth under noise level σ=100 on CAVE dataset. Two demarcated areas in each image are ampliﬁed at a 6 times larger

scale for easy observation of details.

image. The larger these three measures are, the closer the target

HSI is to the reference one. ERGAS measures ﬁdelity of the re-

stored image based on the weighted sum of MSE in each band.

SAM measures the spectral ﬁdelity between the restored image

and the reference image across all spatial positions. Diﬀerent

from the former three measures, the smaller these two measures

are, the better does the target HSI estimate the reference one.

Performance Evaluation. For each noise setting, all of the ﬁve

PQI values for each competing HSI denoising methods on all

32 scenes have been calculated and recorded. Table 1 lists the

average performance over diﬀerent scenes under noise settings

of all methods. From these quantitative comparison, the advan-

tage of the proposed method can be evidently observed. Par-

ticularly, with the increase of noise intensity, our method sur-

passes the second best ITS-Reg under the measure of PSNR

by a large margin (e.g. 0.96 dB under σ=75, 2.5dB under

σ=75). This is due to the overﬁtting issue commonly ex-

isted in state-of-the-art methods. Our method successfully ad-

dress this issue by automatically determining the tensor rank,

consequently achieving great performance especially in severe

pollution case. Figures 5 to 8 illustrate the visual results of dif-

ferent methods under diﬀerent noise levels. It can be seen that

our method consistently outperform other methods as we mea-

sured in Table 1. Speciﬁcally, in Figure 6, we can see except

GLF and our method, none of the competing methods can suc-

cessfully recover the exact edge shape of cloud exhibited in the

green box. In Figure 8, only GLF, ITS-Reg and our method

produce smooth and noise-free results, while the ﬁne-grained

details of ours are much clearer and shaper than ITS-Reg’s. We

also compute the PSNR value of each bands in these four HSIs

8

Table 1: Average performance of 10 competing methods w.r.t. 5 PQIs. For each speciﬁc noise intensity setting, the results are obtained by averaging through the 32

scenes. The best results of each case among these methods are denoted by boldface.

Sigma Index

Methods

Noisy BM3D BM4D LRMR LRTV LRTA LLRGTV GLF TDL ITSReg Ours

[24] [12] [19] [45] [41] [21] [22] [16] [13]

15

PSNR 24.61 39.81 42.38 37.21 33.54 39.21 38.46 43.41 42.30 43.43 43.46

SSIM 0.291 0.951 0.968 0.869 0.912 0.930 0.948 0.977 0.972 0.972 0.976

FSIM 0.794 0.973 0.981 0.974 0.938 0.971 0.978 0.989 0.987 0.989 0.988

ERGAS 325.24 56.41 41.35 76.49 124.88 60.89 71.05 38.49 41.98 37.26 36.72

SAM 0.785 0.157 0.151 0.391 0.204 0.183 0.175 0.128 0.101 0.138 0.103

25

PSNR 20.17 37.03 39.59 33.49 32.42 36.67 36.63 40.96 39.72 40.57 41.21

SSIM 0.148 0.919 0.943 0.736 0.895 0.893 0.913 0.957 0.957 0.945 0.964

FSIM 0.661 0.955 0.968 0.952 0.922 0.953 0.969 0.984 0.979 0.980 0.982

ERGAS 542.05 77.50 57.16 115.39 136.84 81.21 84.89 50.50 56.39 51.45 48.26

SAM 0.933 0.208 0.215 0.569 0.234 0.218 0.254 0.167 0.123 0.242 0.118

50

PSNR 14.15 33.49 35.65 28.35 29.82 33.16 33.45 37.15 36.16 37.55 37.83

SSIM 0.052 0.862 0.870 0.470 0.846 0.819 0.812 0.890 0.918 0.919 0.927

FSIM 0.465 0.922 0.938 0.890 0.891 0.919 0.944 0.970 0.956 0.963 0.966

ERGAS 1084.15 116.60 90.13 204.78 183.40 120.99 118.41 77.24 84.58 72.85 71.07

SAM 1.124 0.277 0.340 0.797 0.350 0.278 0.433 0.263 0.186 0.243 0.173

75

PSNR 10.63 31.36 33.28 25.27 27.98 31.17 31.28 34.75 34.08 34.78 35.74

SSIM 0.026 0.810 0.794 0.310 0.787 0.762 0.716 0.812 0.875 0.881 0.889

FSIM 0.362 0.894 0.908 0.826 0.870 0.892 0.921 0.957 0.934 0.945 0.951

ERGAS 1626.14 147.89 118.14 290.62 224.31 152.38 149.57 101.80 107.73 100.36 90.47

SAM 1.225 0.338 0.429 0.913 0.477 0.318 0.585 0.357 0.243 0.297 0.224

100

PSNR 8.13 29.83 31.56 23.03 26.50 29.69 29.64 33.03 32.56 31.77 34.26

SSIM 0.015 0.767 0.723 0.214 0.751 0.712 0.635 0.747 0.826 0.835 0.855

FSIM 0.299 0.871 0.879 0.766 0.853 0.869 0.899 0.944 0.911 0.914 0.938

ERGAS 2168.26 175.21 143.73 375.29 267.94 180.21 178.72 123.92 128.06 143.74 107.08

SAM 1.290 0.383 0.496 0.995 0.540 0.350 0.695 0.432 0.299 0.306 0.263

Table 2: Average performance of 10 competing methods w.r.t. 5 PQIs under unkowen Gaussian noise level. The results are obtained by averaging through the 32

scenes. The best results of each case among these methods are denoted by boldface.

PSNR SSIM FSIM ERGAS SAM

None 14.03 ±4.62 0.079 ±0.108 0.462 ±0.197 1235.75±613.62 1.124 ±0.276

BM3D 33.36 ±3.31 0.857 ±0.052 0.919 ±0.030 119.26 ±35.93 0.292 ±0.112

BM4D 35.73 ±3.02 0.877 ±0.055 0.934 ±0.033 92.11 ±27.73 0.320 ±0.145

LRMR 29.35 ±3.77 0.603 ±0.170 0.893 ±0.061 194.22 ±71.11 0.610 ±0.228

LRTV 29.38 ±3.03 0.841 ±0.072 0.894 ±0.043 194.92 ±63.90 0.361 ±0.157

LRTA 33.34 ±3.21 0.844 ±0.065 0.924 ±0.029 119.97 ±37.56 0.236 ±0.080

LLRGTV 32.38 ±3.14 0.783 ±0.092 0.931 ±0.031 135.66 ±42.20 0.410 ±0.200

GLF 36.88 ±3.06 0.859 ±0.086 0.967 ±0.015 82.62 ±27.99 0.287 ±0.168

TDL 36.20 ±3.09 0.915 ±0.035 0.952 ±0.022 85.57 ±24.05 0.183 ±0.085

ITSReg 37.17 ±3.17 0.916 ±0.042 0.959 ±0.023 78.21 ±25.31 0.218 ±0.150

Ours 37.70 ±2.98 0.924 ±0.034 0.965 ±0.017 73.20 ±21.36 0.174 ±0.084

(i.e. watercolors,cloth, etc.). It can be seen in Figure 9, the

PSNR values of all bands obtained by LBTF-HSI are signiﬁ-

cantly higher than those of the other methods.

Denoising under Unknown Noise Level. Motivated by appeal-

ing noise intensity self-adaptive property aforementioned of our

method, we conduct experiments under unknown Gaussian noise

level for further demonstrating the advantages of the proposed

method. Here, we still adopt 32 real-world scenes HSIs from

the Columbia Dataset described above. Unlike former exper-

iment, which recurrently adds Gaussian noise with intensity

from 15 to 100 into 32 clean HSIs to generate 160 corrupted

HSIs, we only generate 32 corrupted HSIs with noise intensi-

ties randomly sampled from a uniform distribution of range [15,

100] in this experiment. Notice the true noise intensities are not

provided, we use an oﬀ-the-shelf noise estimation method [51]

to estimate it, which is set as the input parameter for all com-

pared methods except ours. Table 2 summarizes the qualitative

9

400 450 500 550 600 650 700

25

30

35

40

45 BM3D

BM4D

LRMR

LRTV

LRTA

LLRGTV

GLF

TDL

ITSReg

Ours

(a) chart and stuﬀed toy

400 450 500 550 600 650 700

26

28

30

32

34

36 BM3D

BM4D

LRMR

LRTV

LRTA

LLRGTV

GLF

TDL

ITSReg

Ours

(b) watercolors

400 450 500 550 600 650 700

20

25

30

35

40 BM3D

BM4D

LRMR

LRTV

LRTA

LLRGTV

GLF

TDL

ITSReg

Ours

(c) ﬂowers

400 450 500 550 600 650 700

20

22

24

26

28

30

32 BM3D

BM4D

LRMR

LRTV

LRTA

LLRGTV

GLF

TDL

ITSReg

Ours

(d) cloth

Figure 9: PSNR values across the spectrum corresponding to chart and stuﬀed

toy (Fig. 5), watercolors (Fig. 6), ﬂowers (Fig. 7) and cloth (Fig. 8) respec-

tively.

results of this experiment, which shows our method surpasses

about 0.48 dB than previous best-performance method ITS-Reg

under the measure of PNSR while with the best stability (less

variance) among all the competing methods.

Run Time. In addition to visual quality, another important as-

pect for an HSI denoising method is the run time. We then

compare the speed of all competing methods. All experiments

are run under the Matlab2016a environment on a machine with

Intel(R) Core(TM) i7-7700K CPU of 4.2GHz and 16 GB RAM.

Figure 10 shows the Time v.s. PSNR of diﬀerent methods for de-

noising HSIs of size 512 ×512×31. The results are obtained by

10 0

10 1

10 2

10 3

10 4

Time (sec)

29

30

31

32

33

34

35

36

37

38

PSNR (dB)

BM3D

LRMR

LRTV

TDL

BM4D

ITSReg

BCTF-HSI

LRTA

LLRGTV

GLF

Figure 10: Time (second) v.s. PSNR (dB) of all competing method for HSI

denoising.

averaging all 32 scenes with variety of noise intensity. We can

see that eﬀectiveness potentially often sacriﬁces eﬃciency. Our

method is relatively slower than TDL, BM4D and GLF. How-

ever, taking the great enhancement in denoising eﬀectiveness

into account, our method is still highly completable with these

two state-of-the-art methods. On the other hand, our method

typically achieves 2 times speed even with better denoising ca-

pability compared with ITS-Reg.

4.2. Real HSI Denoising

Here, the Hyperspectral Digital Imagery Collection Exper-

iment (HYDICE) urban dataset4and the Harvard real-world

hyperspectral datasets (HHD)[52] are utilized to evaluate our

method in real-world noise context. The original HSI in HY-

DICE is of size 304 ×304 ×210. As the bands 139-155, 201-

210 are seriously polluted by the atmosphere and water absorp-

tion, and can provide little useful information, we manually

remove them and leave the remaining test data with a size of

304 ×304 ×183 like [13]. The whole HHD dataset consisting

of 50 noisy hyperspectral images of size 1040 ×1392 ×31 are

captured with the wave-lengths in the range of 420-720 nm at

an interval of 10. We scale these HSIs into the interval [0, 1],

and employ the similar implementation strategies and param-

eter settings for all competing methods as previous simulated

experiments. Noise estimation method [51] used before is also

applied in this setting. We illustrate the experimental results in

Figure 11 and Figure 12 respectively.

Figure 11 includes the restorations of bands 1, 109 of the

urban HSI. We ﬁnely choose two demarcated area with spe-

ciﬁc semantics to conveniently compare the denoising capabil-

ity of diﬀerent methods. Speciﬁcally, The red box area of band

1 represents the housing estate in urban area. It can be obvi-

ously observed that most of competing methods (e.g. BM3D,

BM4D, LRTA, TDL, ITS-Reg) cannot remove the inappropri-

ate stripes existed in this area, while some methods (i.e. LRMR,

LRTV) produce oversmooth results, in some degree destroying

4http://www.tec.army.mil/hypercube

10

(a) Noisy image (b) BM3D (c) BM4D (d) LRMR (e) LRTV (f) LRTA

(g) LLRGTV (h) GLF (i) TDL (j) ITSReg (k) Ours

Figure 11: Real complex noise removal results results at two bands (indexed by 1, 109 respectively) of HYDICE urban HSI. Two demarcated areas in each image

are ampliﬁed at a 6 times larger scale for easy observation of details.

(a) Noisy image (b) BM3D (c) BM4D (d) LRMR (e) LRTV

(f) LLRGTV (g) GLF (h) TDL (i) ITSReg (j) Ours

Figure 12: Real random noise removal results on HHD dataset. One demarcated area in each image is ampliﬁed at a 2 times larger scale for easy observation of

details.

the original structure of objects of this housing estate. LL-

RGTV, GLF and our LBTF-HSI successfully gets rid of the

stripe noise while preserving the topology structure of this hous-

ing estate. At band 109, the image is highly corrupted by mis-

cellaneous complex noise. Obvious artefacts are still remained

in the results of many competing methods (i.e. BM3D, BM4D,

LRTA, TDL, ITS-Reg). While LLRGTV and GLF do produce

appealing results with good perceptual quality, these results ap-

parent deviate from the underlying ground truth (see green box

region at band 109). This phenomena may be caused by the

incorrect speciﬁed subspace dimension (i.e. objective rank re-

quired by their low rank approximation techniques). As a com-

11

2 4 6 8 10 12

Number of the Nonlocal Patches

43.6

43.8

44

44.2

44.4

44.6

PSNR Values

2 4 6 8 10 12

Number of the Nonlocal Patches

0.976

0.978

0.98

0.982

0.984

0.986

SSIM Values

Figure 13: Eﬀects of patch sizes on denoising performance.

0 50 100

Number of the Nonlocal Patches

43

43.5

44

44.5

45

PSNR Values

0 50 100

Number of the Nonlocal Patches

0

500

1000

1500

Times (s)

Figure 14: Eﬀects of the number of nonlocal patches on denoising performance.

10 20 30

Number of Bands

38

40

42

44

PSNR Values

10 20 30

Number of Bands

0.92

0.94

0.96

0.98

1

SSIM Values

Figure 15: Eﬀects of the number of bands on denoising performance.

parison, Our method does not suﬀer from the rank determina-

tion issue, thus it not only recovers the de facto semantics of the

demarcated area (i.e. the scene of neighbourhood of highway),

but also produces results with high ﬁdelity.

Figure 12 displays the real random noise removal results

On HHD dataset. From the demarcated window, we can ob-

serve that our LBTF-HSI method obtains artifact-free image

with clearer texture and line pattern. In summary, LBTF-HSI

has obtained better performance in terms of noise suppression,

detail preserving, visual pleasure and PSNR value under diﬀer-

ent noise level, even in the real-world unknown noise context.

4.3. Discussion

Besides the initialization strategy aforementioned, there are

other parameters introduced by diﬀerent stages of our model,

i.e. patch size, numbers of nonlocal patches (for grouping)

and numbers of iterations (for iterative framework). Figure 13

shows the PSNR/SSIM values with respect to diﬀerent patch

size. Patch size 6 (6x6) and 7 achieve best PSNR and SSIM

values respectively, among all candidates. Figure 14 illustrates

how PSNR/Times value varies with respect to the number of

nonlocal patches. We can see the denoising results become

gradually better with large number of nonlocal patches, infer-

ring the nonlocal self-similarity could be suﬃciently utilized by

our model, even in a relaxed condition. Nevertheless, given the

computational cost and marginal enhancement through increas-

12345

Number of Iterations

32

34

36

38

40

42

44

46

PSNR Values

σ=15 σ=25 σ=50 σ=75 σ=100

Figure 16: Eﬀects of the number of iterations on denoising performance with

respect to diﬀerent noise levels.

ing the number of nonlocal patches, we set it to 50 in all of our

experiment.

We also show how the number of bands of HSI inﬂuences

the denoising capacity of our model. From Figure 15, we can

observe that the denoising results become gradually better with

larger number of bands. This suggests the information con-

tained in one band could be utilized to recover other bands, such

that the global correlation along the spectrum can be eﬀectively

exploited by our model.

Figure 16 displays the eﬀects of numbers of iteration on de-

noising performance with respect to diﬀerent noise levels. Gen-

erally, more stronger noise intensity will require more iteration

times to achieve better performance, while at a expense of com-

puting eﬃciency. we can see when noise intensity is relatively

small (e.g. σ=15), running algorithm in more than 2 iterations

would successively degenerate the performance. Though the

degradation issue is not observed during 5 iterations in strong

corruption cases (e.g. σ=50,75,100), the performance in-

crement through iterations becomes limited while signiﬁcantly

increasing the computational cost. Therefore, we suggest the

use of {1, 2, 3, 4, 5}for σ={15,25,50,75,100}in the simu-

lated data experiments respectively.

5. Conclusion

In this paper, we presented an eﬀective Low-rank Bayesian

Tensor Factorization based HSI denoising method, which con-

sidered two intrinsic characteristics of HSIs: the nonlocal self-

similarity across space and the global correlation across spec-

trum. In order to suﬃciently embed these useful priors into our

model, the LBTF is utilized to describe the spatial-spectral cor-

relation of each FBP formed by block matching. This model

was eﬀectively solved by our deterministic algorithm derived

under the variational Bayesian framework. Besides, an iterative

denoising framework was introduced for the purpose of further

enhancing the denoising capability of our method. The experi-

mental results on simulated and real HSI denoising showed that

12

the proposed method outperformed many state-of-the-art meth-

ods and demonstrated the eﬀectiveness of the proposed method.

We encode the noise structure as Gaussian distribution in

our hierarchical probabilistic model. Since in real case, the sta-

tistical distribution of noise structure may be hard to be deter-

mined, it is worth investigating more eﬀective noise model to

model the noise from the real world in future.

6. Acknowledgements

We thank the anonymous reviewers for their helpful com-

ments and suggestions to improve this paper. This work was

supported by the National Science Foundation of China under

Grants no. 61672096.

References

[1] J. F. Mustard, C. M. Pieters, Photometric phase functions of common ge-

ologic minerals and applications to quantitative analysis of mineral mix-

ture reﬂectance spectra, Journal of Geophysical Research: Solid Earth

94 (B10) (1989) 13619–13634.

[2] R. Neville, Automatic endmember extraction from hyperspectral data for

mineral exploration, in: International Airborne Remote Sensing Confer-

ence and Exhibition, 4 th/21 st Canadian Symposium on Remote Sensing,

Ottawa, Canada, 1999.

[3] M. Gianinetto, G. Lechi, The development of superspectral approaches

for the improvement of land cover classiﬁcation, IEEE Transactions on

Geoscience and Remote Sensing 42 (11) (2004) 2670–2679.

[4] M. Lewis, V. Jooste, A. A. de Gasparis, Discrimination of arid vegetation

with airborne multispectral scanner hyperspectral imagery, IEEE Trans-

actions on Geoscience and Remote Sensing 39 (7) (2001) 1471–1479.

[5] R. Marion, R. Michel, C. Faye, Measuring trace gases in plumes from

hyperspectral remotely sensed data, IEEE Transactions on Geoscience

and Remote Sensing 42 (4) (2004) 854–864.

[6] A. Chen, The inpainting of hyperspectral images: A survey and adapta-

tion to hyperspectral data, SPIE Remote Sensing. International Society

for Optics and Photonics (2012) 85371–85371.

[7] H. Van Nguyen, A. Banerjee, R. Chellappa, Tracking via object re-

ﬂectance using a hyperspectral video camera, in: The IEEE Confer-

ence on Computer Vision and Pattern Recognition Workshops (CVPRW),

2010, pp. 44–51.

[8] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader,

J. Chanussot, Hyperspectral unmixing overview: Geometrical, statistical,

and sparse regression-based approaches, IEEE Journal of Selected Topics

in Applied Earth Observations and Remote Sensing 5 (2) (2012) 354–379.

[9] R. Kawakami, Y. Matsushita, J. Wright, M. Ben-Ezra, Y.-W. Tai,

K. Ikeuchi, High-resolution hyperspectral imaging via matrix factoriza-

tion, in: The IEEE Conference on Computer Vision and Pattern Recogni-

tion (CVPR), IEEE, 2011, pp. 2329–2336.

[10] M. Uzair, A. Mahmood, A. Mian, Hyperspectral face recognition with

spatiospectral information fusion and pls regression, IEEE Transactions

on Image Processing 24 (3) (2015) 1127–1137.

[11] F. Deger, A. Mansouri, M. Pedersen, J. Y. Hardeberg, Y. Voisin, A sensor-

data-based denoising framework for hyperspectral images, Optics express

23 (3) (2015) 1938–1950.

[12] M. Maggioni, V. Katkovnik, K. Egiazarian, A. Foi, Nonlocal transform-

domain ﬁlter for volumetric data denoising and reconstruction, IEEE

Transactions on Image Processing 22 (1) (2013) 119–133.

[13] Q. Xie, Q. Zhao, D. Meng, Z. Xu, S. Gu, W. Zuo, L. Zhang, Multispectral

images denoising by intrinsic tensor sparsity regularization, in: The IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), 2016,

pp. 1692–1700.

[14] Q. Yuan, L. Zhang, H. Shen, Hyperspectral image denoising employing

a spectral–spatial adaptive total variation model, IEEE Transactions on

Geoscience and Remote Sensing 50 (10) (2012) 3660–3677.

[15] Y.-Q. Zhao, J. Yang, Hyperspectral image denoising via sparse represen-

tation and low-rank constraint, IEEE Transactions on Geoscience and Re-

mote Sensing 53 (1) (2015) 296–308.

[16] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, B. Zhang, Decomposable

nonlocal tensor dictionary learning for multispectral image denoising, in:

Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 2014, pp. 2949–2956.

[17] X. Liu, S. Bourennane, C. Fossati, Denoising of hyperspectral images us-

ing the parafac model and statistical performance analysis, IEEE Trans-

actions on Geoscience and Remote Sensing 50 (10) (2012) 3717–3724.

[18] G. Chen, S.-E. Qian, Denoising of hyperspectral imagery using principal

component analysis and wavelet shrinkage, IEEE Transactions on Geo-

science and Remote Sensing 49 (3) (2011) 973–980.

[19] H. Zhang, W. He, L. Zhang, H. Shen, Q. Yuan, Hyperspectral image

restoration using low-rank matrix recovery, IEEE Transactions on Geo-

science and Remote Sensing 52 (8) (2014) 4729–4743.

[20] Y. Fu, A. Lam, I. Sato, Y. Sato, Adaptive spatial-spectral dictionary learn-

ing for hyperspectral image restoration, International Journal of Com-

puter Vision 122 (2) (2017) 228–245.

[21] W. He, H. Zhang, H. Shen, L. Zhang, Hyperspectral image denoising us-

ing local low-rank matrix recovery and global spatial–spectral total varia-

tion, IEEE Journal of Selected Topics in Applied Earth Observations and

Remote Sensing 11 (3) (2018) 713–729.

[22] L. Zhuang, J. M. Bioucas-Dias, Hyperspectral image denoising based

on global and non-local low-rank factorizations, in: Image Processing

(ICIP), 2017 IEEE International Conference on, IEEE, 2017, pp. 1900–

1904.

[23] M. Elad, M. Aharon, Image denoising via sparse and redundant represen-

tations over learned dictionaries, IEEE Transactions on Image Processing

15 (12) (2006) 3736–3745.

[24] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse

3-d transform-domain collaborative ﬁltering, IEEE Transactions on Im-

age Processing 16 (8) (2007) 2080–2095.

[25] K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a gaussian de-

noiser: Residual learning of deep cnn for image denoising, IEEE Trans-

actions on Image Processing.

[26] S. Gu, L. Zhang, W. Zuo, X. Feng, Weighted nuclear norm minimization

with application to image denoising, in: The IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), 2014, pp. 2862–2869.

[27] J. Xu, L. Zhang, W. Zuo, D. Zhang, X. Feng, Patch group based nonlocal

self-similarity prior learning for image denoising, in: The IEEE Interna-

tional Conference on Computer Vision (ICCV), 2015, pp. 244–252.

[28] A. Chopra, H. Lian, Total variation, adaptive total variation and noncon-

vex smoothly clipped absolute deviation penalty for denoising blocky im-

ages, Pattern Recognition 43 (8) (2010) 2609–2619.

[29] E. J. Cand`

es, X. Li, Y. Ma, J. Wright, Robust principal component analy-

sis?, Journal of the ACM 58 (3) (2011) 11.

[30] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, Y. Ma, Fast convex op-

timization algorithms for exact recovery of a corrupted low-rank matrix,

Computational Advances in Multi-Sensor Adaptive Processing 61 (6).

[31] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method

for exact recovery of corrupted low-rank matrices, arXiv preprint

arXiv:1009.5055.

[32] T. Zhou, D. Tao, Godec: Randomized low-rank & sparse matrix decom-

position in noisy case, in: International Conference on Machine Learning

(ICML), Omnipress, 2011.

[33] X. Ding, L. He, L. Carin, Bayesian robust principal component analysis,

IEEE Transactions on Image Processing 20 (12) (2011) 3419–3430.

[34] Y. J. Lim, Y. W. Teh, Variational bayesian approach to movie rating pre-

diction, in: Proceedings of KDD cup and workshop, Vol. 7, 2007, pp.

15–21.

[35] V. Y. Tan, C. F ´

evotte, Automatic relevance determination in nonnega-

tive matrix factorization, in: SPARS’09-Signal Processing with Adaptive

Sparse Structured Representations, 2009.

[36] S. D. Babacan, M. Luessi, R. Molina, A. K. Katsaggelos, Sparse bayesian

methods for low-rank matrix estimation, IEEE Transactions on Signal

Processing 60 (8) (2012) 3964–3977.

[37] Q. Zhao, D. Meng, Z. Xu, W. Zuo, L. Zhang, Robust principal component

analysis with complex noise, in: International Conference on Machine

Learning (ICML), 2014, pp. 55–63.

[38] Y. Chen, X. Cao, Q. Zhao, D. Meng, Z. Xu, Denoising hyperspectral

13

image with non-iid noise structure, arXiv preprint arXiv:1702.00098.

[39] T. G. Kolda, B. W. Bader, Tensor decompositions and applications, SIAM

review 51 (3) (2009) 455–500.

[40] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis,

C. Faloutsos, Tensor decomposition for signal processing and machine

learning, IEEE Transactions on Signal Processing 65 (13) (2017) 3551–

3582.

[41] N. Renard, S. Bourennane, J. Blanc-Talon, Denoising and dimensionality

reduction using multilinear tools for hyperspectral images, IEEE Geo-

science and Remote Sensing Letters 5 (2) (2008) 138–142.

[42] Q. Zhao, L. Zhang, A. Cichocki, Bayesian cp factorization of incomplete

tensors with automatic rank determination, IEEE Transactions on Pattern

Analysis and Machine Intelligence 37 (9) (2015) 1751–1763.

[43] C. M. Bishop, Pattern recognition and machine learning, springer, 2006.

[44] A. Georges, G. Kotliar, W. Krauth, M. J. Rozenberg, Dynamical mean-

ﬁeld theory of strongly correlated fermion systems and the limit of inﬁnite

dimensions, Reviews of Modern Physics 68 (1) (1996) 13.

[45] W. He, H. Zhang, L. Zhang, H. Shen, Total-variation-regularized low-

rank matrix factorization for hyperspectral image restoration, IEEE

Transactions on Geoscience and Remote Sensing 54 (1) (2016) 178–188.

[46] F. Yasuma, T. Mitsunaga, D. Iso, S. K. Nayar, Generalized assorted pixel

camera: postcapture control of resolution, dynamic range, and spectrum,

IEEE Transactions on Image Processing 19 (9) (2010) 2241–2253.

[47] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality as-

sessment: from error visibility to structural similarity, IEEE Transactions

on Image Processing 13 (4) (2004) 600–612.

[48] L. Zhang, L. Zhang, X. Mou, D. Zhang, Fsim: A feature similarity index

for image quality assessment, IEEE Transactions on Image Processing

20 (8) (2011) 2378–2386.

[49] L. Wald, Data fusion: deﬁnitions and architectures: fusion of images of

diﬀerent spatial resolutions, Presses des MINES, 2002.

[50] R. H. Yuhas, J. W. Boardman, A. F. Goetz, Determination of semi-arid

landscape endmembers and seasonal trends using convex geometry spec-

tral unmixing techniques, in: Summaries of the 4th Annual JPL Airborne

Geoscience Workshop, 1993.

[51] X. Liu, M. Tanaka, M. Okutomi, Single-image noise level estimation for

blind denoising, IEEE Transactions on Image Processing 22 (12) (2013)

5226–5237.

[52] A. Chakrabarti, T. Zickler, Statistics of real-world hyperspectral im-

ages, in: IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), IEEE, 2011, pp. 193–200.

14