ArticlePDF Available

3-D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising

Authors:
Article

3-D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising

Abstract and Figures

In this article, we propose an alternating directional 3-D quasi-recurrent neural network for hyperspectral image (HSI) denoising, which can effectively embed the domain knowledge--structural spatiospectral correlation and global correlation along spectrum (GCS). Specifically, 3-D convolution is utilized to extract structural spatiospectral correlation in an HSI, while a quasi-recurrent pooling function is employed to capture the GCS. Moreover, the alternating directional structure is introduced to eliminate the causal dependence with no additional computation cost. The proposed model is capable of modeling spatiospectral dependence while preserving the flexibility toward HSIs with an arbitrary number of bands. Extensive experiments on HSI denoising demonstrate significant improvement over the state-of-the-art under various noise settings, in terms of both restoration accuracy and computation time. Our code is available at https://github.com/Vandermode/QRNN3D.
Content may be subject to copyright.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 1
3D Quasi-Recurrent Neural Network for
Hyperspectral Image Denoising
Kaixuan Wei, Ying Fu, Member, IEEE, and Hua Huang, Senior Member, IEEE
Abstract—In this paper, we propose an alternating directional
3D quasi-recurrent neural network for hyperspectral image (HSI)
denoising, which can effectively embed the domain knowledge
— structural spatio-spectral correlation and global correlation
along spectrum. Specifically, 3D convolution is utilized to extract
structural spatio-spectral correlation in an HSI, while a quasi-
recurrent pooling function is employed to capture the global
correlation along spectrum. Moreover, alternating directional
structure is introduced to eliminate the causal dependency
with no additional computation cost. The proposed model is
capable of modeling spatio-spectral dependency while preserving
the flexibility towards HSIs with arbitrary number of bands.
Extensive experiments on HSI denoising demonstrate significant
improvement over state-of-the-arts under various noise settings,
in terms of both restoration accuracy and computation time. Our
code is available at https://github.com/Vandermode/QRNN3D.
Index Terms—Hyperspectral image denoising, structural
spatio-spectral correlation, global correlation along spectrum,
quasi-recurrent neural networks, alternating directional struc-
ture
I. INTRODUCTION
HYPERSPECTRAL image (HSI) is made up of massive
discrete wavebands for each spatial position of real
scenes and provides much richer information about scenes
than RGB images, which has led to numerous applications
in remote sensing [27], [34], classification [2], [6], [31], [38],
[45], tracking [37], face recognition [36], and more. However,
due to the limited light for each band, traditional HSIs
are often degraded by various noises (i.e., Gaussian, stripe,
deadline, and impulse noises) during the acquisition process.
These degradations negatively influence the performance of all
subsequent HSI processing tasks aforementioned. Therefore,
HSI denoising is an essential pre-processing in the typical
workflow of HSI analysis and processing.
Recently, more HSI denoising works pay attention to the
domain knowledge of the HSI — structural spatio-spectral
correlation and global correlation along spectrum (GCS) [42].
Top-performing classical methods [8], [9], [39], [41], [42]
typically utilize non-local low-rank tensors to model them.
Although these methods achieve higher accuracy by effectively
considering these underlying characteristics, the performance
of such methods is inherently determined by how well the
human handcrafted prior (e.g. low-rank tensors) matches with
the intrinsic characteristics of an HSI. Besides, such ap-
proaches generally formulate the HSI denoising as a complex
optimization problem to be solved iteratively, making the
denoising process time-consuming.
Alternative learning-based approaches rely on convolutional
neural networks in lieu of the costly optimization and hand-
10-2
10-1
100
101
102
103
104
Running Time (sec)
37
38
39
40
41
42
PSNR (dB)
Gaussian Noise Case (Blind)
BM4D [28]
TDL [30]
ITSReg [42]
LLRT [9]
HSID-CNN [46]
MemNet [33]
QRNN3D
10-2
10-1
100
101
102
103
Running Time (sec)
25
30
35
40
PSNR (dB)
Complex Noise Case (Mixture)
LRMR [48]
LRTV [20]
NMoG [11]
LRTDTV [39] HSID-CNN [46]
MemNet [33]
QRNN3D
Fig. 1: Our QRNN3D outperforms all leading-edge methods
on ICVL dataset in both Gaussian and complex noise cases.
crafted priors [7], [46]. Promising results notwithstanding,
these approaches model HSI by learned multichannel or band-
wise 2D convolutions, which sacrifice either the flexibility
with respect to the spectral dimension [7] (hence requiring
retraining network to adapt to HSIs with mismatched spectral
dimention), or the model capability to extract GCS knowledge
[46] (thus leading to relatively low performance as shown in
Figure 1).
In principal, the trade-off between the model capability
and flexibility imposes a fundamental limit for real-world
applications. In this paper, we find that combining domain
knowledge with 3D deep learning (DL) can achieve both
goals simultaneously. Unlike prior DL approaches [7], [46]
that always utilize the 2D convolution as a basic building
block of network, we introduce a novel building block namely
3D quasi-recurrent unit (QRU3D) to model HSI from a 3D
perspective. This unit contains a 3D convolutional subcom-
ponent and a quasi-recurrent pooling function [5], enabling
structural spatio-spectral correlation and GCS modeling re-
spectively. The 3D convolutional subcomponent can extract
spatio-spectral features from multiple adjacent bands, while
the quasi-recurrent pooling recurrently merges these features
over the whole spectrum, controlled by a dynamic gating
mechanism. This mechanism renders the pooling weights
to be dynamically calculated by the input features, thereby
allowing for adaptively modeling the GCS knowledge. To
eliminate the unidirectional causal dependency (Figure 4),
introduced by the vanilla recurrent structure, we furthermore
propose an alternating directional structure with no additional
computation cost.
Our network, called 3D quasi-recurrent neural network
(QRNN3D), has been designed to make full use of the
domain knowledge especially the GCS. It makes significant
improvements in model capability/accuracy while is agnostic
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 2
to the spectral dimension of input HSIs, thus can be applied
to any HSIs captured by unknown sensors (with different
spectral resolutions). Over extensive experiments, QRNN3D
outperforms all leading-edge methods on several benchmark
datasets under various noise settings as shown in Figure 1.
Our main contributions are summarized that we
1) present a novel building block namely QRU3D that can
effectively exploit the domain knowledge – structural
spatio-spectral correlation and global correlation along
spectral (GCS) simultaneously.
2) introduce an alternating directional structure to eliminate
the unreasonable causal dependency towards HSI model-
ing, with no additional computation cost.
3) demonstrate our model pretrained on ICVL dataset can be
directly utilized to tackle remotely sensed imagery which
is infeasible in conventional 2D DL approaches for the
HSI modeling.
The remainder of this paper is organized as follows. In
Section II, we review related HSI denoising methods and DL
approaches that inspire our work. Section III introduces the
QRNN3D approach for HSI denoising. Extensive experimental
results on natural scenes of HSI database and remote sensed
images are presented in Section IV, followed by more discus-
sions that facilitate the understanding of QRNN3D in Section
V. Conclusions are drawn in Section VI.
II. RE LATE D WOR K
A. HSI Denoising
Existing methods towards HSI denoising can be roughly
classified into two categories depending on the noise model.
The most frequently used noise model is zero-mean white
and homogeneous Gaussian additive noise. Under this as-
sumption, BM4D [28], an extension of the BM3D filter
[13] to volumetric data, could be directly applied for HSI
denoising. By regarding the GCS and non-local self-similarity
in HSI simultaneously, Peng et al. proposed a tensor dictionary
learning (TDL) model [30] which achieved very promising
performance. Following this line, more sophisticated methods
have been successively proposed [8], [9], [14], [16], [19],
[41], [42], [50]. Among these methods, the low-rank tensor
based models, i.e. ITS-Reg [42], LLRT [9] and a new iterative
projection and denoising algorithm, i.e. NG-meet [19] achieve
state-of-the-art performance, owing to their elaborate efforts
on modeling intrinsic property of the HSI.
Besides, several works [11], [20], [39], [43], [48] aim to
resolve the realistic complex noise by modeling the noise with
complicated non-i.i.d. statistical structures. They all frame the
denoising problem into a low-rank based optimization scheme,
and then utilize some constraints (e.g. total variation, l1and
nuclear norm) to remove the complex noise (e.g. non-i.i.d.
Gaussian, stripe, deadline, impulse).
Recently, leveraging the power of the DL, Chang et al. [7]
extended the 2D image denoising architecture – DnCNN [49]
to remove various noise in HSIs. They argued the learned
filters can well extract the structural spatial information.
Yuan et al. [46] utilized a deep residual network to recover
the remotely sensed images under Gaussian noise, which
processed HSI with a sliding window strategy. Concurrently
to our work, Dong et al. [15] proposed a 3D factorizable
U-net architecture to exploit spatial-spectral correlations in
HSIs from the 3D perspective. All these DL-based methods
insufficiently exploit the GCS knowledge, and they cannot
adjust the learned parameters to adaptively fit input data,
consequently lacking the freedoms to discriminate the input-
dependent spatio-spectral correlations.
In this paper, we leverage the power of the DL to au-
tomatically learn the mapping purely from the data instead
of handcrafted prior and complex optimization, reaching to
orders-of-magnitude speedup in both Gaussian and complex
noise contexts. Besides, our DL-based method can effectively
exploit the underlying characteristics — structural spatio-
spectral correlation and GCS, even without sacrificing the
flexibility towards HSIs with arbitrary number of bands.
B. Deep Learning for Image Denoising
Researches on Gray/RGB image denoising has been domi-
nated by the discriminative learning based approach especially
the deep convolutional neural network (CNN) in recent years
[10], [29], [33], [49], [51], [52]. Zhang et al. [49] proposed a
modern deep architecture namely DnCNN by embedding the
batch normalization [23] and residual learning [18]. Mean-
while, Mao et al. [29] presented a very deep fully convo-
lutional encoding-decoding framework for image restoration
such as denoising and super-resolution. Both of them yielded
better Gaussian denoising results and less computation time
than the highly-engineered benchmark BM3D [13]. Along
this line, more works have been proposed to explore the
deep architecture design for image denoising. For example,
MemNet [33] introduces memory block to investigate the
long-term information. Residual dense network [52] goes
beyond that to build dense connections inner blocks. Residual
non-local attention network [51] utilizes local and non-local
attention blocks to extract features that capture the long-range
dependencies between pixels and pay more attention to the
challenging parts.
Although all these networks can be directly extended into
the HSI case, none of them specifically consider the domain
knowledge of the HSI.
C. Deep Image Sequence Modeling
Modeling image sequence with various lengths is a fun-
damental problem in a variety of research fields such as
precipitation nowcasting, video processing, and so on.
Bidirectional recurrent convolutional networks (BRCN) [22]
and convolutional LSTM (ConvLSTM) [44] were proposed for
resolving the multi-frame super-resolution and precipitation
nowcasting problem respectively. The key insight of these
models is to replace the common-used recurrent full connec-
tions by weight-sharing convolutional connections such that
they can greatly reduce the large number of network parame-
ters and well model the temporal dependency in a finer level
(i.e. patch-based rather than frame-based). However, these
patch-based operations cannot efficiently capture the spectral
correlation, meanwhile recurrently applying convolution along
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 3
TABLE I: Network configuration of our residual encoder-
decoder style QRNN3D for HSI restoration.
Layer Cout Stride Output size
Extractor 16 1,1,1H×W×B
Encoder
16 1,1,1H×W×B
32 2,2,1H
2×W
2×B
32 1,1,1H
2×W
2×B
64 2,2,1H
4×W
4×B
64 1,1,1H
4×W
4×B
Decoder
64 1,1,1H
4×W
4×B
32 1
2,1
2,1W
2×W
2×B
32 1,1,1H
2×W
2×B
16 1
2,1
2,1H×W×B
16 1,1,1H×W×B
Reconstructor 1 1,1,1H×W×B
spectrum would drastically increase the computational com-
plexity. In contrast, our QRNN3D employs an elementwise
recurrent mechanism, enabling good scaling to HSI with a
large number of bands. Besides, this mechanism naturally
imposes a prior constraint over the spectrum, making it well-
suited for extracting GCS knowledge.
Fig. 2: The overall architecture of our residual encoder-decoder
QRNN3D. The network contains layers of symmetric QRU3D
with convolution and deconvolution for encoder (blue) and
decoder (orange) respectively. Symmetric skip connections are
added in each layer. Besides, alternating directional structure
is equipped in all layers except the top and bottom ones with
bidirectional structure to avoid bias.
III. THE PRO PO SE D METHOD
An HSI degraded by additive noise can be linearly modeled
as
Y=X+,(1)
where {Y,X,} ∈ RH×W×B,Yis the observed noisy image,
Xis the original clean image, denotes the additive random
noise. H, W, B indicate the spatial height, spatial width, and
number of spectral bands respectively.
Here, we consider miscellaneous noise removal in denoising
context, where can represent different types of random noise
including Gaussian noise, sparse noise (stripe, deadline and
impulse) or mixture of them. Given a noisy HSI, our goal is
to obtain its noise-free counterpart.
In this section, we introduce the residual encoder-decoder
QRNN3D for HSI denoising. As shown in Figure 2, our
network consists of six pairs of symmetric QRU3D with
convolution and deconvolution for encoder and decoder re-
spectively, leading to twelve layers in total. We use two layers
with stride=2 convolution to downsample the input in encoder
part, and then two layers with stride=1/2 to upsample in
decoder part. The benefits from downsampling and unsampling
operations are that we can use a larger network under the same
computational cost, and increase receptive field size to make
use of the context information in larger image region. Table
I illustrates our network configuration. Each layer contains a
QRU3D with kernel size 3×3×3, which is set to maximize
performance empirically [35]. Stride and output channels
(Cout) in each layer are listed and other configuration (e.g.
padding) can be inferred implicitly.
In the following, we first present the QRU3D, which is
the core building block in our method. Then, alternating
directional structure used to eliminate the unreasonable causal
dependency is introduced, and learning details are provided.
A. 3D Quasi-Recurrent Unit
QRU3D is the basic building block of QRNN3D. It consists
of two subcomponents, i.e. 3D convolutional subcomponent
and quasi-recurrent pooling, as shown in Figure 3. Unlike the
2D convolution, both of the subcomponents do not enforce
the number of spectral bands, making the QRNN3D free for
processing HSIs with arbitrary bands.
3D Convolutional Subcomponent. The 3D convolutional
subcomponent of QRU3D performs two set of 3D convolutions
[24], [35] with separated filter banks, producing sequence of
tensors passed through different activation functions,
Z= tanh(WzI),
F=σ(WfI),(2)
where IRCin×H×W×Bis the input feature maps coming
from last layer (in first layer, input I=Ywith Cin = 1);
ZRCout×H×W×Bis a high dimensional candidate tensor.
Fhas the same dimension as Z, representing the neural forget
gate that controls the behavior of dynamic memorization. Both
Wzand WfRCout×Cin ×3×3×3are the 3D convolutional
filter banks and denotes a 3D convolution, σindicates a
sigmoid non-linearity.
The 3D convolution is achieved by convolving a 3D kernel
to a whole HSI in both spatial and spectral dimensions. The
3D convolution in the spatial domain can mimic numerous
operations widely used in low-level vision (like image patch
extraction and 2D patch transform in BM3D [13], [26]) and
the 3D convolution in the spectral domain can model the
local spectrum continuity to alleviate the spectral distortion.
Consequently, the embedded C3D can effectively exploit the
structural spatio-spectral correlation in HSIs.
Quasi-Recurrent Pooling. Although the 3D convolutional
subcomponent has already exploited the inter-band relation-
ship, it is computed in a local way and cannot explicitly
exploit GCS. To effectively utilize the GCS, we present quasi-
recurrent pooling, in which pooling operation and dynamic
gating mechanism are introduced.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 4
Fig. 3: The overall structure of QRU3D. It can be described in four steps. First, the input Iis transformed by two set of 3D
convolutions, generating a candidate tensor Zand a neural forget gate F. Second, Zand Fare split along the spectrum to
produce sequences of zband fb. Third, the quasi-recurrent pooling function is applied recurrently to merge the previous hidden
state hb1and current candidate zbcontrolled by current neural gates fb, resulting in a new hidden state hb. Finally, each
hidden state hbis concatenated together to form the whole output Hto the next layer.
In our QRU3D, the quasi-recurrent pooling is applied after
the candidate tensor Zand neural forget gate Fare obtained
by the 3D convolutional subcomponent. We first split Zand
Falong the spectrum, generating sequences of zband fb
respectively, and then feed these states into a quasi-recurrent
pooling function [5],
hb=fbhb1+ (1 fb)zb,b[1, B],(3)
where denotes an element-wise multiplication, hb1is
the hidden state merged through all previous states and also
represents the (b1)-th band in the output of this layer, h0=0
with all entries equal to zero. The forget gate fbbalances
the weight of current candidate zband previous memory, i.e.
hidden state hb1. Its value depends on the current input
Iinstead of being fixed like a convolutional filter, which
can effectively adapt to the input image own and not solely
rely on the parameters learned in the training stage. By this
construction, the inter-band information would be accurately
merged. Meanwhile, since this dynamic pooling recurrently
operates across the whole spectrum, the GCS can be effectively
exploited. The output feature maps Hwill be produced by
concatenating all hidden states along the spectrum.
In addition, due to independent neural gate and element-
wise recurrent operations (multiplication), the QRU3D is
highly parallel, enabling good scaling to HSI with a large
number of bands. More specifically, the calculation of neural
forget gate fbis only dependent on multiple contiguous bands
of input instead of involving the previous hidden state in
typical RNNs (e.g. LSTM [21] and GRU [12]). Meanwhile,
the elementwise multiplication is exceedingly computationally
economical than the convolution used by ConvLSTM [44],
thus can be easily recurrently utilized hundreds of times.
B. Alternating Directional Structure
A forward 3D quasi-recurrent unit, as in Equation (3), reads
a candidate tensor zbin order starting from the first z1to
the last zB, so that a hidden state hbonly depends on the
(a) (b) (c)
Fig. 4: Directional structure overview. (a) Unidirectional struc-
ture: hidden states propagate unidirectionally. (b) Bidirectional
structure: one layer contains two sublayers which propagate
states with inverse direction, generating results by adding
sublayers’ output. (c) Our proposed alternating directional
structure: direction of network changes in each layer.
Fig. 5: Synthesized RGB image samples from ICVL dataset.
previous zb(and theirs corresponding bands). This introduces
the causal dependency since the computing stream of hidden
state propagates unidirectionally as shown in Figure 4(a),
which is not reasonable for the HSI.
A typical solution is to use a bidirectional structure [4], [22],
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 5
[32], in which a layer of network contains two sublayers, i.e.
a forward QRU3D and a backward QRU3D in our case, as
shown in Figure 4(b). The forward QRU3D reads the candidate
tensor sequence in order and calculates a sequence of forward
hidden states. The backward QRU3D reads the sequence in
reverse order, leading to a sequence of backward hidden states.
The output of this layer is calculated by adding the forward and
backward hidden states elementwisely. However, this structure
makes the computational burden unacceptable because of the
nearly double amount of memory consumption.
To ease this issue, we present an alternating directional
structure for HSIs. In specific, a QRNN3D with alternating
directional structure changes the direction of computing stream
of hidden state in each layer, as shown in Figure 4(c). This
structure is built by alternately stacking forward and backward
QRU3D, in which a forward (or backward) state is be merged
by a backward (or forward) state in next layer, such that the
global context information could be propagated through the
whole spectrum.
Compared with the typical solution by bidirectional struc-
ture, our proposed alternating directional structure almost adds
no additional computation cost, while keeping the ability
to model the dependency from whole spectrum of an HSI
regardless of the position of the output.
IV. EXP ER IM EN TAL RESULTS
A. Experimental settings
Benchmark Datasets. We conduct several experiments
using data from ICVL hyperspectral dataset [3], where 201
images were collected at 1392 ×1300 spatial resolution over
31 spectral bands. The simulated pseudo color image samples
from this dataset are illustrated in Figure 5. We use 100
images for training, 5 images for validation, while others
are for testing. To enlarge the training set, we crop multiple
overlapped volumes from training HSIs and then regard each
volume as a training sample. During cropping, each volume
has a spatial size of 64 ×64 and a spectral size of 31 for the
purpose of preserving the complete spectrum of an HSI. Data
augmentation schemes such as rotation and scaling are also
employed, resulting in roughly 50k training samples in total.
As for testing set, we crop the main region of each image with
size of 512 ×512 ×31 given the computation cost1.
Besides, we evaluate the robustness and flexibility of our
model in remotely sensed hyperspectral datasets including
Pavia Centre,Pavia University,Indian Pines and Urban.
Pavia Centre and Pavia University were acquired by the
ROSIS sensor, the number of spectral bands is 102 for Pavia
Centre and 103 for Pavia University.Indian Pines and Urban
were gathered by 224-bands AVIRIS sensor and 210-bands
HYDICE hyperspectral system respectively. Both of them have
been used for real HSI denoising experiments [9], [20], [39].
Noise settings. Real-world HSIs are usually contaminated
by several different types of noise, including the most common
Gaussian noise, impulse noise, dead pixels or lines, and stripes
[11], [17], [48]. We define five types of complex noise as
1It’s unwieldy to evaluate a image with large size in some competing
methods rather than ours, see Figure 1 for more detail.
follows, and the types of complex noise are referred as Case
1-5 respectively.
Case 1: Non-i.i.d. Gaussian noise. Entries in all bands are
corrupted by zero-mean Gaussian noise with different
intensities, randomly selected from 10 to 70.
Case 2: Gaussian + Stripe noise. All bands are corrupted
by non-i.i.d. Gaussian noise as Case 1. One third
of bands (10 bands for ICVL dataset) are randomly
chosen to add stripe noise (5% to 15% percentages
of columns).
Case 3: Gaussian + Deadline noise. The noise generation
process is nearly the same as Case 2 except the stripe
noise is replaced by deadline.
Case 4: Gaussian + Impulse noise. Each band is contaminated
by Gaussian noise as Case 1. One third of bands are
randomly selected to add impulse noise with intensity
ranged from 10% to 70%.
Case 5: Mixture noise. Each band is randomly corrupted by
at least one kind of noise mentioned in Case 1-4.
Competing Methods. We compare our method against
both traditional and DL methods in both Gaussian and com-
plex noise cases. In general, the traditional methods are best
suited to be applied in a specific noise setting, relying on
their noise assumption. While DL methods, can be applied
in various noise setting by training multiple models to tackle
miscellaneous noises. For the sake of fairness, we adopt
different traditional baselines in these two noise contexts,
given their noise assumptions.
In Gaussian noise case, we compare with several represen-
tative traditional methods including filtering-based approaches
(BM4D [28]), dictionary learning approach (TDL [30]), and
tensor-based approaches (ITSReg [42], LLRT [9]). In complex
noise case, the competing traditional baselines include low-
rank matrix recovery approaches (LRMR [48], LRTV [20],
NMoG [11]), and low-rank tensor approach (TDTV [39]).
For DL approaches, we compare our model with HSID-
CNN [46]. Besides, any DL method for single image denoising
can be extended to HSI denoising case (by modifying the first
layer to adapt the HSI, i.e. changing Cin from 3 to 31). For
completeness, we also compare such state-of-the-art 2D DL
approach, i.e. MemNet [33] with Cin = 31 in first layer, which
entails the fixed number of spectral bands. Since the training
setting is different between ours and other DL approaches, we
finetune/retrain their pretrained models with our well-designed
training strategy to achieve better performance in our dataset.
Network learning. We develop an incremental training
policy to stabilize and accelerate the training, which also
avoids the network converging to a poor local minimum. The
philosophy of our training policy is simple: learning to solve
tasks in an easy-to-difficult way [1]. Networks are learned
by minimizing the mean square error (MSE) between the
predicted high-quality HSI and the ground truth. The network
parameters are initialized as in [17], and optimized using
ADAM optimizer [25] with the deep learning framework Py-
torch2on a machine with NVIDIA GTX 1080Ti GPU, Intel(R)
Core(TM) i7-7700K CPU of 4.2GHz and 16 GB RAM. Unlike
2https://pytorch.org/
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 6
TABLE II: Overview of our incremental train policy. Our network learning goes through three stages, from the easy task of
Gaussian denoising with fixed noise level, to the difficult one of complex noise removal. In our implementation, fixed noise
level σin stage 1 is set to 50. Unknown σin stage 2 is uniformly sampled from 30 to 70. Unknown complex noise in stage 3
denotes the complex noise randomly chosen from Case 1 to 4 (without Case 5: mixture noise). The models trained at the end
of stage 2 (epoch 50) and 3 (epoch 100) are used in Gaussian denoising and complex noise removal tasks respectively.
Stage 1 2 3
Noise model Gaussian noise with known σGaussian noise with unknown σUnknown complex noise
Epoch 0 20 20 30 30 35 35 45 45 50 50 85 85 95 95 100
Learning rate 103104103104105103104105
Batch size 16 64
(a) Noisy
(14.17)
(b) BM4D
(33.00)
(c) TDL
(35.11)
(d) ITSReg
(36.09)
(e) LLRT
(36.08)
(f) HSID-CNN
(35.22)
(g) MemNet
(36.29)
(h) Ours
(36.73)
Fig. 6: Simulated Gaussian noise removal results of PSNR (dB) at 20th band of image under noise level σ= 50 on ICVL
dataset. (Best view on screen with zoom)
Noisy LRMR [48] LRTV [20] NMoG [11] TDTV [39] D-CNN [46] MemNet [33] Ours
Case 1Case 2Case 3Case 4Case 5
Fig. 7: Simulated complex noise removal result s on ICVL dataset. Examples for non-i.i.d Gaussian noise, Gaussian + stripes,
Gaussian + deadline, Gaussian + impulse and mixture noise removal (Cases 1-5) are presented respectively. (Best view on
screen with zoom)
training networks independently to tackle several different types of noise separately, we simply train two models in both
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 7
400 450 500 550 600 650 700
30
35
40
45
BM4D
TDL
ITSReg
LLRT
HSID-CNN
MemNet
QRNN3D
(a) i.i.d. Gaussian (σ= 50)
400 450 500 550 600 650 700
25
30
35
40
45
50
LRMR
LRTV
NMoG
LRTDTV
HSID-CNN
MemNet
QRNN3D
(b) Non-i.i.d. Gaussian (Case 1)
400 450 500 550 600 650 700
25
30
35
40
45
50
LRMR
LRTV
NMoG
LRTDTV
HSID-CNN
MemNet
QRNN3D
(c) Gaussian + Stripe (Case 2)
400 450 500 550 600 650 700
25
30
35
40
45
50
LRMR
LRTV
NMoG
LRTDTV
HSID-CNN
MemNet
QRNN3D
(d) Gaussian + Deadline (Case 3)
400 450 500 550 600 650 700
20
25
30
35
40
45
LRMR
LRTV
NMoG
LRTDTV
HSID-CNN
MemNet
QRNN3D
(e) Gaussian + Impulse (Case 4)
400 450 500 550 600 650 700
20
25
30
35
40
45
LRMR
LRTV
NMoG
LRTDTV
HSID-CNN
MemNet
QRNN3D
(f) Mixture (Case 5)
Fig. 8: PSNR values across the spectrum corresponding to Gaussian and complex noise removal results in Figure 6 and 7
respectively.
Gaussian and complex noise cases respectively. Our network
learning goes through three stages, from the easy task of
Gaussian denoising with fixed noise level, to the difficult
one of complex noise removal. The models are incrementally
trained that reuse the prior state (pretrained parameters) to
maximize the training efficiency (See discussions in Section
V-A). We follow the previous image restoration work [29] to
choose hyper-parameters of learning algorithm. These values
were empirically set to make network learning fast yet stable.
Specifically, the learning rate is initialized at 103and decayed
at epochs, where the validation performance not increases any
more. Small batch size (i.e. 16) is used to accelerate training at
first stage, while large batch size (i.e. 64) is adopted to stabilize
training when tackling harder cases (e.g. complex noise case).
The overview of our training procedures is shown in Table II,
with detailed hyper-parameter setting.
Quantitative Metrics. To give an overall evaluation, three
quantitative quality indices are employed, i.e. PSNR, SSIM
[40], and SAM [47]. PSNR and SSIM are two conventional
spatial-based indexes, while SAM is spectral-based. Larger
values of PSNR and SSIM imply better performance, while
a smaller value of SAM suggests better performance.
B. Experiments on ICVL Dataset
Denoising in Gaussian Noise Case. Zero mean additive
white Gaussian noises with different variance are added to
generate the noisy observations. The model trained at the end
of stage 2 (epoch 50) is used to tackle all different levels
of corruption3. Figure 6 shows the denoising results under
noise level σ= 50. It can be easily observed that the image
restored by our method is capable of properly removing the
Gaussian noise while finely preserving the structure underlying
the HSI. Traditional methods like BM4D and TDL introduce
evident artifacts to some areas. Other methods suppress the
noise better, but still lose some fine-grained details and pro-
duce relatively low-quality results compared with ours. The
qualitative assessment results are listed in Table III. Compared
3We do not train multiple networks to tackle different noise intensities
respectively. Instead, only one single network is trained using training sample
with various noise intensities.
with all competing methods, the QRNN3D achieves better per-
formance in most qualitative/quantitative assessments, further
confirming the high fidelity of our method.
Denoising in Complex Noise Case. Five types of the com-
plex noise are added to generate noisy samples. In brief, cases
1-5 represent non-i.i.d Gaussian noise, Gaussian + stripes,
Gaussian + deadline, Gaussian + impulse, and mixture of
them respectively (see Section IV-A for more details). Like
Gaussian noise case, a single model trained at the end of
stage 3 (epoch 100) is utilized to dealing with case 1-5 simul-
taneously. It’s worth noting that each sample in our training
set is corrupted by one of noise types (i.e. cases 1-4), while
in case 5, each testing sample suffers from multiple types of
noise, not contained in the training set. We show the qualitative
and quantitative results in Figure 7 and Table IV respectively,
which show our QRNN3D significantly outperforms the other
methods. Furthermore, the results in mixture noise case exhibit
the strong generalization of our model since the mixture noise
is not seen by our model in the training stage.
In Figure 7, the observation images are corrupted by miscel-
laneous complex noises. Low-rank matrix recovery methods,
i.e. LRMR and LRTV, holding the assumption that the clean
HSI lies in low-rank subspace from the spectral perspective,
successfully remove great mass of noise, but at a cost of
losing fine details. Our QRNN3D eliminates miscellaneous
noises to a great extent, while more faithfully preserving the
fine-grained structure of original image (e.g. the texture of
road in the second photo of Figure 7) than top-performing
traditional low-rank tensor approach TDTV and other DL
methods. Figure 8 shows the PSNR value of each bands in
these HSIs. It can be seen that the PSNR values of all bands
obtained by Our QRNN3D are obviously higher than those
compared methods.
C. Experiments on Remotely Sensed Images
Synthetic Data. Here, we conduct experiments on Pavia
University in mixture noise case. Given the similarity between
Pavia Centre and Pavia University, the model is first trained
from scratch only on Pavia Centre. It can be seen our train-
from-scratch model (Ours-S in Table V) performs undesirable,
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 8
(a) Noisy
(13.54)
(b) LRMR
(26.35)
(c) LRTV
(25.93)
(d) NMoG
(28.90)
(e) LRTDTV
(30.06)
(f) HSID-CNN
(30.14)
(g) Ours-S
(29.64)
(h) Ours-P
(31.50)
(i) Ours-F
(34.32)
(j) Clean
(+)
Fig. 9: Simulated complex noise removal results of PSNR (dB) at 10th band of image in case 5 (mixture noise) on Pavia
University dataset. (Best view on screen with zoom)
TABLE III: Quantitative results of different methods under several noise levels on ICVL dataset. ”Blind” suggests each sample
is corrupted by Gaussian noise with unknown σ(ranged from 30 to 70).
Sigma Index
Methods
Noisy BM4D TDL ITSReg LLRT HSID-CNN MemNet Ours
[28] [30] [42] [9] [46] [33]
30
PSNR 18.59 38.45 40.58 41.48 41.99 38.70 41.45 42.28
SSIM 0.110 0.934 0.957 0.961 0.967 0.949 0.972 0.973
SAM 0.807 0.126 0.062 0.088 0.056 0.103 0.065 0.061
50
PSNR 14.15 35.60 38.01 38.88 38.99 36.17 39.76 40.23
SSIM 0.046 0.889 0.932 0.941 0.945 0.919 0.960 0.961
SAM 0.991 0.169 0.085 0.098 0.075 0.134 0.076 0.072
70
PSNR 11.23 33.70 36.36 36.71 37.36 34.31 38.37 38.57
SSIM 0.025 0.845 0.909 0.923 0.930 0.886 0.946 0.945
SAM 1.105 0.207 0.105 0.112 0.087 0.161 0.088 0.087
Blind
PSNR 17.34 37.66 39.91 40.62 40.97 37.80 40.70 41.50
SSIM 0.114 0.914 0.946 0.953 0.956 0.935 0.966 0.967
SAM 0.859 0.143 0.072 0.087 0.064 0.116 0.070 0.066
TABLE IV: Quantitative results of different methods in five complex noise cases on ICVL dataset.
Case Index
Methods
Noisy LRMR LRTV NMoG TDTV HSID-CNN MemNet Ours
[48] [20] [11] [39] [46] [33]
1
PSNR 18.25 32.80 33.62 34.51 38.14 38.40 38.94 42.79
SSIM 0.168 0.719 0.905 0.812 0.944 0.947 0.949 0.978
SAM 0.898 0.185 0.077 0.187 0.075 0.095 0.091 0.052
2
PSNR 17.80 32.62 33.49 33.87 37.67 37.77 38.57 42.35
SSIM 0.159 0.717 0.905 0.799 0.940 0.942 0.945 0.976
SAM 0.910 0.187 0.078 0.265 0.081 0.104 0.095 0.055
3
PSNR 17.61 31.83 32.37 32.87 36.15 37.65 38.15 42.23
SSIM 0.155 0.709 0.895 0.797 0.930 0.940 0.945 0.976
SAM 0.917 0.227 0.115 0.276 0.099 0.102 0.096 0.056
4
PSNR 14.80 29.70 31.56 28.60 36.67 35.00 35.93 39.23
SSIM 0.114 0.623 0.871 0.652 0.935 0.899 0.907 0.945
SAM 0.926 0.311 0.242 0.486 0.094 0.174 0.126 0.109
5
PSNR 14.08 28.68 30.47 27.31 34.77 34.05 35.16 38.25
SSIM 0.099 0.608 0.858 0.632 0.919 0.888 0.903 0.938
SAM 0.944 0.353 0.287 0.513 0.113 0.181 0.130 0.107
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 9
(a) Noisy (b) BM4D (c) TDL (d) ITSReg (e) LLRT (f) LRMR
(g) LRTV (h) NMoG (i) TDTV (j) HSID-CNN (k) Ours
Fig. 10: Real-world unknown noise removal results at 2th band of image on AVIRIS Indian Pines dataset. (Best view on
screen with zoom)
(a) Noisy (b) BM4D (c) TDL (d) ITSReg (e) LLRT (f) LRMR
(g) LRTV (h) NMoG (i) TDTV (j) HSID-CNN (k) Ours
Fig. 11: Real-world unknown noise removal results at 107th band of image on HYDICE Urban dataset. (Best view on screen
with zoom)
TABLE V: Quantitative results of different methods in mixture noise case on Pavia University dataset. ”Ours-S” is our trained-
from-scratch model which is only trained on Pavia Centre dataset; ”Ours-P” denotes our pretrained model which is only trained
on ICVL dataset; ”Ours-F” indicates our fine-tuned model which is pretrained on ICVL dataset, and then is fine-tuned on
Pavia Centre dataset.
Index
Methods
Noisy LRMR LRTV NMoG TDTV HSID-CNN Ours Ours Ours
[48] [20] [11] [39] [46] S P F
PSNR 13.54 26.35 25.93 28.90 30.06 30.14 29.64 31.50 34.32
SSIM 0.161 0.660 0.676 0.781 0.819 0.805 0.892 0.866 0.925
SAM 0.896 0.406 0.359 0.388 0.239 0.142 0.166 0.127 0.093
even compared with traditional method TDTV (29.64 v.s.
30.06).
Nevertheless, our method utilizes QRU3D, which makes
it can be naturally used for input data with various number
of bands. On the basis of this flexibility, we directly apply
our model pretrained on ICVL dataset (in complex noise
case) to Pavia University. Although the Pavia University is
recorded with a spectral curve totally distinct from ICVL
dataset, our model called Ours-P performs much better than
all compared methods4, which strongly verifies the robustness
4The result of HSID-CNN is also obtained by its pretrained model on
ICVL dataset under complex noise case. The learned MemNet cannot be
useful for the data with different bands and its results are not provided in
Table V.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 10
TABLE VI: Ablations on ICVL HSI Gaussian denoising
(under noise level σ= 50). We evaluate the results by
PSNR (dB), running Time (sec) and the number of parameters
(Params) of these networks. All running times are measured on
a Nvidia GTX 1080Ti by processing an HSI with size of 512
×512 ×31. Direction of network is denoted by initials, i.e.
U: unidirectional; B: bidirectional; A: alternating directional,
Our benchmark network is indicated by boldface. The results
of MemNet are also provided as an additional reference.
Model PSNR (dB) Time (s) Params (#)
MemNet 39.76 0.88 2.94M
QRU2D 38.63 0.60 0.29M
WQRU2D 39.82 1.16 0.88M
C3D 36.83 0.56 0.43M
WC3D 40.00 0.93 1.72M
QRU3D 40.23 0.74 0.86M
U 40.07 0.75 0.86M
B 40.26 1.26 1.72M
A40.23 0.74 0.86M
of our method.
Furthermore, we employ small pieces of samples from Pavia
Center to fine-tune the model only learned from ICVL dataset.
This learned model (Ours-F in Table V) significantly boosts
the performance. The visual comparison is provided in Figure
9. Interestingly, the Gaussian-like residuals are still visible in
Ours-S model, while Ours-P model suffers from stripes. Ours-
F model combines the strengths of the two models, yielding
clear and clean result. This seems to indicate the knowledge
from ICVL dataset is complementary to one from Pavia Centre
dataset, so that the transfer learning enabled by flexibility will
bring great benefits in performance.
Real-world Noisy Data. We also verify our model in real-
world noisy HSI Indian Pines and Urban without correspond-
ing ground truth. It can be observed in Figure 10 and Figure
11 that terrible atmosphere and water absorption obstruct the
view to the real scenario, severely degrading the quality of
images. The Gaussian denoising methods, e.g. BM4D, TDL,
cannot accurately estimate the underlying clean image due to
the non-Gaussian noise structure. Our QRNN3D successfully
tackles this unknown noise, and produces sharper and clearer
result than others, consistently demonstrating the robustness
and flexibility of our model.
V. DISCUSSION AND ANALYS IS
In this section, we provide a broad discussion and analysis
of QRNN3D to facilitate understanding of where its great
performance comes from. We first demonstrate the efficacy of
our incremental training policy, then analyze the functionality
of each network component in QRNN3D (i.e. 3D convolution,
quasi-recurrent pooling, alternating-directional structure). The
selection of network hyper-parameters is followed. The visu-
alization method (and results) of GCS knowledge in QRU3D
are presented in final.
0 10 20 30 40 50 60 70 80 90 100
Epoch
10-4
10-3
10-2
Average Training Loss (MSE)
training from scratch
incremental training
0 10 20 30 40 50 60 70 80 90 100
Epoch
28
30
32
34
36
38
Validation PSNR
training from scratch
incremental training
Fig. 12: Average training loss (Left) and Validation PSNR
(Right) of QRNN3D for complex noise removal. We show
the results of the model trained from scratch, and the one
that reuses the pretrained parameters in Gaussian denoising
(incremental training).
A. Efficacy of Incremental Training Policy
The key idea of our training policy lies at the fact that
knowledge can be efficiently learned in an easy-to-difficult
way [1]. Our training policy enables reusing prior learned
knowledge (pretrained parameters), which significantly sta-
bilizes and accelerates the whole training process. As an
example, we show the optimization curves with and without
reusing the pretrained parameters when training the model in
complex noise case. As shown in Figure 12, training from
scratch renders the optimization slow, instable and converge
to a poor local minimum, in contrast to training with a good
initialization in our incremental learning policy.
B. Component Analysis in QRNN3D
To thoroughly verify the functionality of each component
in our QRNN3D, comprehensive ablation experiments are
conducted on HSI Gaussian denoising task on ICVL dataset.
We focus on the components associated with HSI modeling
and domain knowledge embedding, and study the best trade-
off between performance and computational burden. The eval-
uation measures include PSNR, running time and total number
of parameters of network.
We choose our encoder-decoder QRNN3D as the bench-
mark. For fair comparison, same network architecture is used
except the modification in the investigated component. Ab-
lation results are exhibited in Table VI and analyzed in the
following.
Subcomponents Investigation. Table VI investigates the
effect of subcomponents (i.e. 3D convolution and quasi-
recurrent pooling function) in QRU3D. QRU3D is the basic
building block of our QRNN3D. In the experiments, four
variants of this basic block are tested, i.e. QRU2D,WQRU2D,
C3D and WC3D.
QRU2D is instantiated by replacing the 3D convolution by
2D convolution (implemented by simply setting the kernel size
to 3×3×1). Drastic performance losing (i.e. -1.6 dB) can be
observed in Table VI, meaning ignoring the structural spectral
correlation would severely impact the model capacity.
WQRU2D is formed by a wider QRU2D model whose
number of parameters is comparable to QRU3D. Nevertheless,
It can be observed that the QRU3D still outperforms the
WQRU2D, even with less computation cost, which suggests
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 11
5 10 15 20 25 30
Band j
5
10
15
20
25
30
Band i
Relative Region
(a)
0 5 10 15 20 25 30
Band #
0
5
10
15
20
25
30
# of Relative Bands
Backward
Forward
(b) (c)
Fig. 13: (a) The captured GCS in a bidirectional QRU3D layer. (b) The number of relative bands for output of each band.
Band idefined as an ”relative band” for band jmeans it will produce at least 10% perturbation (i.e. GCSij 0.1k1kF,
where 1has same size as hjwith all entries equal to 1) to the output if discarded. Forward/Backward denotes the direction
of dependency. (i.e. i<j for forward direction). (c) The empirical distribution of the number of relative bands.
the higher efficiency of 3D convolution against the 2D ap-
proach towards HSI modeling.
C3D is constructed by removing the quasi-recurrent pooling
(and the associated neural gates), definitely a residual encoder-
decoder 3D convolutional neural network. We find lack of
mechanism to model the GCS, would degrade the performance
by a large margin (-3.4 dB).
WC3D is built by a wider C3D model with more parameters
(four times as much as the C3D model). It can be seen the
PSNR of QRU3D is 40.23 dB, higher than the WC3D’s 40.00
dB. This suggests that the improvement of quasi-recurrent
pooling is not just because it adds width to the C3D model.
Besides, the QRU3D has only 50% parameters and 80%
running time of the WC3D model and is also narrower. This
comparison shows that the improvement from quasi-recurrent
pooling is complementary to going wider in standard ways.
Direction of Network. Table VI also shows the results of
different directional structures denoted by initials (e.g. U for
unidirectional, e.t.c.). Without considering backward spectral
dependency, the unidirectional architecture performs worst.
After eliminating the causal dependency, both alternating
directional and bidirectional architectures significantly exceed
the unidirectional one, and achieve similar performance (40.26
v.s. 40.23). Nevertheless, the bidirectional version requires
much larger memory footprint than ours alternating directional
structure, indicating the alternating directional structure can
be used as a lightweight alternative to the typical bidirectional
one.
C. Network Hyperparameter Selection
Our principle of network hyper-parameter selection is to
make it compact yet work. Table VII shows the results of
hyper-parameter selection on Gaussian denoising task through
a small grid search, where we select the depth and width of our
QRNN3D considering the best tradeoff between performance
and computation overload.
Nonetheless, we note the major goal of this work is to intro-
duce a novel building block, specially tailored to model HSI.
TABLE VII: Network hyper-parameter selection on ICVL HSI
Gaussian denoising (under noise level σ= 50) through a small
grid search. We evaluate the results by PSNR (dB), running
Time (sec) and the number of parameters (Params) of these
networks. The selected parameters are indicated by boldface.
Depth Width PSNR (dB) Time (s) Params (#)
10
16
39.85 0.68 0.42M
12 40.23 0.74 0.86M
14 39.52 0.80 1.30M
12
12 39.82 0.62 0.48M
16 40.23 0.74 0.86M
20 40.01 1.18 1.34M
Such building block can be naturally inserted into any network
topology, not restricted to the encoder-decoder network used in
this paper. We mainly show the effectiveness of our proposed
building block and don’t pursue higher performance via ex-
haustive search of other configurations. We have demonstrated
state-of-the-art performance of our QRNN3D without heavy
engineering effort on network hyper-parameter selection. Our
current hyper-parameter setting might not be perfect, and the
performance could be boosted potentially by parameter tuning,
though this is not a major focus of this paper.
D. Visualizing GCS Knowledge
To visualize the captured GCS knowledge in QRNN3D, we
first unfold the Equation (3) and obtain
hj=
j
X
i=1
Φj(zi)i, j [1, B], i j, (4)
where Φj(zi) = fjfj1· · ·  fi+1 (1 fi)zi.
We define the GCSij by the degree of zis contribution to
hjunder Frobenius norm measure, i.e.
GCSij =kΦj(zi)/hjkF,(5)
where /denotes element-wise division. It also implies the
band i’s effect on band j. The captured GCS in each QRU3D
layer can be calculated through a single inference pass by
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 12
using Equation (5). To completely visualize GCS5, we choose
the first bidirectional QRU3D for such analysis6. Figure 13(a)
exhibits the captured GCS of a random selected HSI, showing
the output of each band would be highly affected by the whole
spectrum. Figure 13(b) illustrates the number of relative bands
for output of each band. It can be seen that 15th to 17th bands
(h15,h17 ) are deeply correlated to almost all bands (Z). Figure
13(c) summarizes this statistics of all testing images on ICVL.
It shows that a randomly selected band would be typically
related to at least 15 bands (31 in total), meaning the GCS
is effectively utilized by our model and our method can also
automatically determine the most relative bands across global
spectra.
VI. CONCLUSIONS
In this paper, we have proposed an alternating directional
3D quasi-recurrent neural network for hyperspectral image
denoising. Our main contribution is the novel use of 3D
convolution subcomponent, quasi-recurrent pooling function,
and alternating directional scheme for efficient spatio-spectral
dependency modeling. We have applied our model to resolve
HSI denoising beyond the Gaussian, especially in the very
challenging real-world complex noise case, and achieve better
performance and faster speed. We also show our model
pretrained on ICVL dataset can be directly utilized to tackle
remotely sensed images which is infeasible in most of existing
DL approaches for the HSI modeling.
In addition, the visualized results for global correla-
tion along spectrum (GCS) in our 3D quasi-recurrent unit
(QRU3D) further experimentally convinces the GCS is effec-
tively exploited by our model. It’s also worth investigating the
proposed QRU3D in other image sequence modeling tasks in
future.
REFERENCES
[1] M. Ahissar and S. Hochstein. The reverse hierarchy theory of visual
perceptual learning. Trends in Cognitive Sciences, 8(10):457–464, 2004.
[2] N. Akhtar and A. Mian. Nonparametric, coupled ,bayesian ,dictionary
,and classifier learning for hyperspectral classification. IEEE Trans-
actions on Neural Networks and Learning Systems, 29(9):4038–4050,
2018.
[3] B. Arad and O. Ben-Shahar. Sparse recovery of hyperspectral signal
from natural rgb images. In European Conference on Computer Vision,
pages 19–34. Springer, 2016.
[4] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by
jointly learning to align and translate. International Conference on
Learning Representations (ICLR), 2015.
[5] J. Bradbury, S. Merity, C. Xiong, and R. Socher. Quasi-recurrent
neural networks. International Conference on Learning Representations
(ICLR), 2017.
[6] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson. Ad-
vances in hyperspectral image classification: Earth monitoring with sta-
tistical learning methods. IEEE Signal Processing Magazine, 31(1):45–
54, 2014.
[7] Y. Chang, L. Yan, H. Fang, S. Zhong, and W. Liao. Hsi-denet:
Hyperspectral image restoration via convolutional neural network. IEEE
Transactions on Geoscience and Remote Sensing, pages 1–16, 2018.
5in a forward (backward) QRU3D, the captured GCS is an upper (lower)
triangular matrix
6The body of QRNN3D is equipped with the alternating directional
structure, while in head and tail, the bidirectional directional structure is
employed to avoid directional bias.
[8] Y. Chang, L. Yan, H. Fang, S. Zhong, and Z. Zhang. Weighted low-
rank tensor recovery for hyperspectral image restoration. arXiv preprint
arXiv:1709.00192, 2017.
[9] Y. Chang, L. Yan, and S. Zhong. Hyper-laplacian regularized unidi-
rectional low-rank tensor recovery for multispectral image denoising.
In The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 4260–4268, 2017.
[10] C. Chen, Z. Xiong, X. Tian, and F. Wu. Deep boosting for image
denoising. In The European Conference on Computer Vision (ECCV),
September 2018.
[11] Y. Chen, X. Cao, Q. Zhao, D. Meng, and Z. Xu. Denoising hyperspectral
image with non-iid noise structure. arXiv preprint arXiv:1702.00098,
2017.
[12] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,
H. Schwenk, and Y. Bengio. Learning phrase representations using
rnn encoder–decoder for statistical machine translation. In Proceedings
of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP), pages 1724–1734, 2014.
[13] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by
sparse 3-d transform-domain collaborative filtering. IEEE Transactions
on Image Processing, 16(8):2080–2095, 2007.
[14] W. Dong, G. Li, G. Shi, X. Li, and Y. Ma. Low-rank tensor approx-
imation with laplacian scale mixture modeling for multiframe image
denoising. In Proceedings of the IEEE International Conference on
Computer Vision (ICCV), pages 442–449, 2015.
[15] W. Dong, H. Wang, F. Wu, G. ming Shi, and X. Li. Deep spatial-
spectral representation learning for hyperspectral image denoising. IEEE
Transactions on Computational Imaging, pages 1–1, 2019.
[16] Y. Fu, A. Lam, I. Sato, and Y. Sato. Adaptive spatial-spectral dictionary
learning for hyperspectral image restoration. International Journal of
Computer Vision (IJCV), 122(2):228–245, 2017.
[17] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification. In The
IEEE International Conference on Computer Vision (ICCV), December
2015.
[18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 770–778, 2016.
[19] W. He, Q. Yao, C. Li, N. Yokoya, and Q. Zhao. Non-local meets
global: An integrated paradigm for hyperspectral denoising. In The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
June 2019.
[20] W. He, H. Zhang, L. Zhang, and H. Shen. Total-variation-regularized
low-rank matrix factorization for hyperspectral image restoration. IEEE
Transactions on Geoscience and Remote Sensing, 54(1):178–188, 2016.
[21] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
Computation, 9(8):1735–1780, 1997.
[22] Y. Huang, W. Wang, and L. Wang. Bidirectional recurrent convolutional
networks for multi-frame super-resolution. In Advances in Neural
Information Processing Systems (NIPS), pages 235–243, 2015.
[23] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network
training by reducing internal covariate shift. In International Conference
on Machine Learning (ICML), pages 448–456, 2015.
[24] S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks
for human action recognition. IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI), 35(1):221–231, 2013.
[25] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[26] S. Lefkimmiatis. Non-local color image denoising with convolutional
neural networks. In The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), July 2017.
[27] T. Lillesand, R. W. Kiefer, and J. Chipman. Remote sensing and image
interpretation. John Wiley & Sons, 2014.
[28] M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi. Nonlocal
transform-domain filter for volumetric data denoising and reconstruction.
IEEE Transactions on Image Processing, 22(1):119–133, 2013.
[29] X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep
convolutional encoder-decoder networks with symmetric skip connec-
tions. In Advances in Neural Information Processing Systems (NIPS),
pages 2802–2810, 2016.
[30] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, and B. Zhang. Decomposable
nonlocal tensor dictionary learning for multispectral image denoising. In
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 2949–2956, 2014.
[31] Z. Ping and R. Wang. Jointly learning the hybrid crf and mlr
model for simultaneous denoising and classification of hyperspectral
imagery. IEEE Transactions on Neural Networks and Learning Systems,
25(7):1319–1334, 2014.
[32] M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEM 13
IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
[33] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory
network for image restoration. In The IEEE International Conference
on Computer Vision (ICCV), Oct 2017.
[34] P. S. Thenkabail and J. G. Lyon. Hyperspectral remote sensing of
vegetation. CRC Press, 2016.
[35] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning
spatiotemporal features with 3d convolutional networks. In Proceedings
of the IEEE International Conference on Computer Vision (ICCV), pages
4489–4497, 2015.
[36] M. Uzair, A. Mahmood, and A. Mian. Hyperspectral face recognition
with spatiospectral information fusion and pls regression. IEEE Trans-
actions on Image Processing, 24(3):1127–1137, 2015.
[37] H. Van Nguyen, A. Banerjee, and R. Chellappa. Tracking via object
reflectance using a hyperspectral video camera. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pages 44–51, 2010.
[38] Q. Wang, J. Lin, and Y. Yuan. Salient band selection for hyperspectral
image classification via manifold ranking. IEEE Transactions on Neural
Networks and Learning Systems, 27(6):1279–1289, 2017.
[39] Y. Wang, J. Peng, Q. Zhao, Y. Leung, X.-L. Zhao, and D. Meng.
Hyperspectral image restoration via total variation regularized low-rank
tensor decomposition. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 2017.
[40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image
quality assessment: from error visibility to structural similarity. IEEE
Transactions on Image Processing, 13(4):600–612, 2004.
[41] K. Wei and Y. Fu. Low-rank bayesian tensor factorization for hyper-
spectral image denoising. Neurocomputing, 331:412 – 423, 2019.
[42] Q. Xie, Q. Zhao, D. Meng, Z. Xu, S. Gu, W. Zuo, and L. Zhang. Mul-
tispectral images denoising by intrinsic tensor sparsity regularization.
In The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 1692–1700, 2016.
[43] Y. Xie, Y. Qu, D. Tao, W. Wu, Q. Yuan, and W. Zhang. Hyperspectral
image restoration via iteratively regularized weighted schatten p-norm
minimization. IEEE Transactions on Geoscience and Remote Sensing,
54(8):4642–4659, 2016.
[44] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-
c. Woo. Convolutional lstm network: A machine learning approach for
precipitation nowcasting. In Advances in Neural Information Processing
Systems (NIPS), pages 802–810, 2015.
[45] S. Yang, Z. Feng, M. Wang, and K. Zhang. Self-paced learning-
based probability subspace projection for hyperspectral image classi-
fication. IEEE Transactions on Neural Networks and Learning Systems,
PP(99):1–6, 2018.
[46] Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang. Hyperspectral
image denoising employing a spatialspectral deep residual convolutional
neural network. IEEE Transactions on Geoscience and Remote Sensing,
57(2):1205–1218, 2019.
[47] R. H. Yuhas, J. W. Boardman, and A. F. Goetz. Determination of semi-
arid landscape endmembers and seasonal trends using convex geometry
spectral unmixing techniques. In Summaries of the 4-th Annual JPL
Airborne Geoscience Workshop, 1993.
[48] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan. Hyperspectral
image restoration using low-rank matrix recovery. IEEE Transactions
on Geoscience and Remote Sensing, 52(8):4729–4743, 2014.
[49] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian
denoiser: Residual learning of deep cnn for image denoising. IEEE
Transactions on Image Processing, 2017.
[50] L. Zhang, W. Wei, Y. Zhang, C. Shen, A. van den Hengel, and Q. Shi.
Cluster sparsity field for hyperspectral imagery denoising. In European
Conference on Computer Vision (ECCV), pages 631–647. Springer,
2016.
[51] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu. Residual non-local attention
networks for image restoration. In International Conference on Learning
Representations, 2019.
[52] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense
network for image restoration. arXiv preprint arXiv:1812.10477, 2018.
... With the emergence of deep learning, many works [14,15] have been developed to circumvent the design of handcrafted priors by taking advantage of the powerful representation capability of deep neural networks. Such methods directly learn the mapping from degraded observations to reconstructed HSIs from a large number of training pairs and implicitly embed the learned prior knowledge into the parameters of the neural network. ...
... Wei et al . [14] presented a novel network QRNN3D for HSI denoising, Mei et al . [17] introduced a 3D fully convolution neural network for HSI super-resolution. ...
... Network Architecture. Inspired by the commonly used UNet [48] and the recently proposed QRNN3D [14] architectures, we design our gated recurrent convolutional neural network as a deep encoder-decoder to exploit the complex information underlying HSIs. As shown in Figure 1, the encoder consists of repeated application of a downsample gated recurrent convolution (GRConv) unit to decrease the spatial size and a GRConv residual block to increase the number of features. ...
Preprint
Full-text available
Deep-learning-based hyperspectral image (HSI) restoration methods have gained great popularity for their remarkable performance but often demand expensive network retraining whenever the specifics of task changes. In this paper, we propose to restore HSIs in a unified approach with an effective plug-and-play method, which can jointly retain the flexibility of optimization-based methods and utilize the powerful representation capability of deep neural networks. Specifically, we first develop a new deep HSI denoiser leveraging gated recurrent convolution units, short- and long-term skip connections, and an augmented noise level map to better exploit the abundant spatio-spectral information within HSIs. It, therefore, leads to the state-of-the-art performance on HSI denoising under both Gaussian and complex noise settings. Then, the proposed denoiser is inserted into the plug-and-play framework as a powerful implicit HSI prior to tackle various HSI restoration tasks. Through extensive experiments on HSI super-resolution, compressed sensing, and inpainting, we demonstrate that our approach often achieves superior performance, which is competitive with or even better than the state-of-the-art on each task, via a single model without any task-specific training.
... Compared with 2D convolution, 3D convolution can not only aggregate spatial domain information, but also exploit the spectral information of the input. (2) Quasi-Recurrent Pooling Module: Considering that the 3D convolution can only aggregate the information in adjacent bands, motivated by the QRNN3D [22], we introduce the quasi-recurrent pooling operation and dynamic gating mechanism in order to fully exploit global correlation along all the bands. We split the candidate tensor and forget gate tensor along the spectrum direction, obtaining sequences Z = {z 1 , z 2 , . . . ...
... The first one is the Gaussian noise experiment, and the second one is the complex noise case. For each of the two cases, the state-of-the-art (SOTA) methods are quite different, which has been recognized by existing work [22]. Therefore, in our experiments, we use different SOTA methods for the two cases. ...
... Therefore, in our experiments, we use different SOTA methods for the two cases. In synthetic experiments, we compare our method with the model-based methods (BM4D [11], TDL [10], ITSReg [12], LLRT [13]), and the deep-learning-based methods (HSID-CNN [20], swinir [36], QRNN3D [22]) in Gaussian noise case, and model-based methods (LRMR [14], LRTV [15], NMoG [16], and TDTV [17]), and the same deep-learning-based methods in complex noise case. All approaches based on deep learning are trained and tested in the same condition to ensure fairness. ...
Article
Full-text available
We propose a new deep neural network termed TRQ3DNet which combines convolutional neural network (CNN) and transformer for hyperspectral image (HSI) denoising. The network consists of two branches. One is built by 3D quasi-recurrent blocks, including convolution and quasi-recurrent pooling operation. Specifically, the 3D convolution can extract the spatial correlation within a band, and spectral correlation between different bands, while the quasi-recurrent pooling operation is able to exploit global correlation along the spectrum. The other branch is composed of a series of Uformer blocks. The Uformer block uses window-based multi-head self-attention (W-MSA) mechanism and the locally enhanced feed-forward network (LeFF) to exploit the global and local spatial features. To fuse the features extracted by the two branches, we develop a bidirectional integration bridge (BI bridge) for better preserving the image feature information. Experimental results on synthetic and real HSI data show the superiority of our proposed network. For example, in the case of Gaussian noise with sigma 70, the PSNR value of our method significantly increases about 0.8 compared with other state-of-the-art methods.
... Nevertheless, the iterative optimization procedure is time-consuming and the handcrafted priors cannot sufficiently represent the variety of data in the real world. Instead of costly optimization and hand-crafted priors, learning-based methods (Wei et al. 2021;Chang et al. 2018;Yuan et al. 2018;Dong et al. 2019;Lin et al. 2019) automatically learn the mapping from noisy HSI to clean HSI with convolutional neural network (CNN). However, existing leaning-based methods rarely consider the high signal-to-noise ratio (SNR) data property of mean image of all spectral bands, as shown in Fig. 2, and dynamic spatial International Journal of Computer Vision Fig. 1 A scene from our collected HSI dataset, and we show the spectral band in 550 nm. a The input noisy image; b The output of CNN trained with synthetic dataset generated by complex noise model (Chen et al. 2017); c The output of CNN trained with our collected paired real dataset; d The output of CNN trained with synthetic dataset generated by our noise model, which is comparable with c the result trained with paired real dataset and obviously outperforms synthetics dataset generated by complex noise model nonlocal correlation. ...
... Besides, existing learning-based methods generally rely on training dataset synthesized with simple Gaussian noise model or complex noise models (Chen et al. 2017;Wei et al. 2021). Promising results on synthetic data notwithstanding, these methods still cannot well work and evaluate on the real data, due to lacking realistic HSI data. ...
... Recently, researchers pay more attention to learning-based methods (Wei et al. 2021;Chang et al. 2018;Yuan et al. 2018;Dong et al. 2019;Lin et al. 2019;Shi et al. 2021), which infer with less time than model-based methods leveraging graphics processing unit (GPU) and automatically learn the deep prior from training dataset. Chang et al. (2018) proposed a HSI denoising network with residual learning and 2D convolution. ...
Article
Full-text available
The hyperspectral image (HSI) denoising has been widely utilized to improve HSI qualities. Recently, learning-based HSI denoising methods have shown their effectiveness, but most of them are based on synthetic dataset and lack the generalization capability on real testing HSI. Moreover, there is still no public paired real HSI denoising dataset to learn HSI denoising network and quantitatively evaluate HSI methods. In this paper, we mainly focus on how to produce realistic dataset for learning and evaluating HSI denoising network. On the one hand, we collect a paired real HSI denoising dataset, which consists of short-exposure noisy HSIs and the corresponding long-exposure clean HSIs. On the other hand, we propose an accurate HSI noise model which matches the distribution of real data well and can be employed to synthesize realistic dataset. On the basis of the noise model, we present an approach to calibrate the noise parameters of the given hyperspectral camera. Besides, on the basis of observation of high signal-to-noise ratio of mean image of all spectral bands, we propose a guided HSI denoising network with guided dynamic nonlocal attention, which calculates dynamic nonlocal correlation on the guidance information, i.e., mean image of spectral bands, and adaptively aggregates spatial nonlocal features for all spectral bands. The extensive experimental results show that a network learned with only synthetic data generated by our noise model performs as well as it is learned with paired real data, and our guided HSI denoising network outperforms state-of-the-art methods under both quantitative metrics and visual quality.
... These methods are usually trained with small patches (20 × 20) for the sake of training cost, thus these methods always have a limited receptive field, leading to performing well only on fixed distribution noises instead of inconsistent noises. To exploit the underlying global and local spatial-spectral characteristics of HSI, Wei et al. [40] proposed QRNN3D, where RNN-based attention [41] were employed. The architecture of QRNN3D is a typical encoder-decoder model with a residual connection. ...
Preprint
Full-text available
Due to inadequate energy captured by the hyperspectral camera sensor in poor illumination conditions, low-light hyperspectral images (HSIs) usually suffer from low visibility, spectral distortion, and various noises. A range of HSI restoration methods have been developed, yet their effectiveness in enhancing low-light HSIs is constrained. This work focuses on the low-light HSI enhancement task, which aims to reveal the spatial-spectral information hidden in darkened areas. To facilitate the development of low-light HSI processing, we collect a low-light HSI (LHSI) dataset of both indoor and outdoor scenes. Based on Laplacian pyramid decomposition and reconstruction, we developed an end-to-end data-driven low-light HSI enhancement (HSIE) approach trained on the LHSI dataset. With the observation that illumination is related to the low-frequency component of HSI, while textural details are closely correlated to the high-frequency component, the proposed HSIE is designed to have two branches. The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution. The high-frequency refinement branch is utilized for refining the high-frequency component via a predicted mask. In addition, to improve information flow and boost performance, we introduce an effective channel attention block (CAB) with residual dense connection, which served as the basic block of the illumination enhancement branch. The effectiveness and efficiency of HSIE both in quantitative assessment measures and visual effects are demonstrated by experimental results on the LHSI dataset. According to the classification performance on the remote sensing Indian Pines dataset, downstream tasks benefit from the enhanced HSI. Datasets and codes are available: \href{https://github.com/guanguanboy/HSIE}{https://github.com/guanguanboy/HSIE}.
... Recently Zhao et al. [21] proposed a novel approach named Attention-based deep residual network for HSI denoising (ADRN) which uses the channel attention to focus on the more meaningful information and trained their model on the Washington DC Mall dataset. Another recent novel deep learning approach is proposed by Wei et al. [17] which is based on 3D quasi-recurrent neural network for HSI Denoising. The work by Z. Kan et al. [11] demonstrates an Attention-Based Octave Dense Network (AODN), in which the attention module fine-tunes the spatial-spectral features while the octave network learns the high frequency noise. ...
Chapter
Hyperspectral images contain both spatial and spectral information, which can be utilized to material identification. Therefore, they find significant advantages in object detection. Hyperspectral images are also believed to play an important part in geological survey and material classification. As the resolution of hyperspectral images increases, compressed sensing (CS) is proposed to reduce the data size, resulting in lower system latency. However, images after CS require reconstruction for further applications such as object detection. The idea of numerical optimization is adopted by conventional reconstruction algorithms. However, these algorithms are time-consuming in iteration. The efficiency and resulting image quality are also not satisfying. Therefore, deep neural networks (DNN) are expected to make better reconstruction algorithms. This paper proposes a novel reconstruction algorithm for hyperspectral images based on deep learning. The core idea is to apply a residual attention network. Firstly, convolution layers of different reception fields are applied to extract different features in hyperspectral images. Then the residual attention blocks satisfying the channel attention mechanism explore the inter-spectral correlation of hyperspectral images. Our proposed reconstruction model is tested to be effective and efficiency in experiments. Compared to three conventional algorithms, OMP, TwIST and GPSR, the proposed algorithm improves PSNR by over 8 db and reconstruction speed by 7 times. Moreover, the model achieves better reconstruction performance compared to a DNN-based model DNNnet.KeywordsDeep neural networksHyperspectral image reconstructionAttention
Article
Due to inadequate energy captured by the hyperspectral camera sensor in poor illumination conditions, low-light hyperspectral images (HSIs) usually suffer from low visibility, spectral distortion, and various noises. A range of HSI restoration methods have been developed, yet their effectiveness in enhancing low-light HSIs is constrained. This work focuses on the low-light HSI enhancement task, which aims to reveal the spatial-spectral information hidden in darkened areas. To facilitate the development of low-light HSI processing, we collect a low-light HSI (LHSI) dataset of both indoor and outdoor scenes. Based on Laplacian pyramid decomposition and reconstruction, we developed an end-to-end data-driven low-light HSI enhancement (HSIE) approach trained on the LHSI dataset. With the observation that illumination is related to the low-frequency component of HSI, while textural details are closely correlated to the high-frequency component, the proposed HSIE is designed to have two branches. The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution. The high-frequency refinement branch is utilized for refining the high-frequency component via a predicted mask. In addition, to improve information flow and boost performance, we introduce an effective channel attention block (CAB) with residual dense connection, which served as the basic block of the illumination enhancement branch. The effectiveness and efficiency of HSIE both in quantitative assessment measures and visual effects are demonstrated by experimental results on the LHSI dataset. According to the classification performance on the remote sensing Indian Pines dataset, downstream tasks benefit from the enhanced HSI. Datasets and codes are available: https://github.com/guanguanboy/HSIE.
Article
Hyperspectral Imagery(HSI) restoration is a fundamental problem as a preprocessing step. In this letter, we present a novel Auto-weighted Nonlocal Tensor Ring Rank Minimization (ANTRRM) to reduce noise in HSI. Firstly, Nonlocal Cuboid Tensorization(NCT), built by similar grouping cuboids in HSI data, exploits the nonlocal self-similarity and the spatial-spectral correlation simultaneously. Then, the proposed model introduces Nuclear Norm (NN) regularization via nonlocal tensor ring with mode-{ d, l } unfolding. An auto-weighted optimization is employed to represent the different importance of TR unfolding. Finally, the Alternating Direction Method of Multipliers(ADMM) scheme is employed to solve the proposed model efficiently. Experiments on two simulation HSIs datasets and a real HSI dataset were carried out, compared with representative approaches in both visual and quantitative comparison. The proposed ANTRRM method is superior except in a few cases.
Article
Deep learning (DL) based hyperspectral images (HSIs) denoising approaches directly learn the nonlinear mapping between noisy and clean HSI pairs. They usually do not consider the physical characteristics of HSIs. This drawback makes the models lack interpretability that is key to understanding their denoising mechanism and limits their denoising ability. In this paper, we introduce a novel model-guided interpretable network for HSI denoising to tackle this problem. Fully considering the spatial redundancy, spectral low-rankness, and spectral-spatial correlations of HSIs, we first establish a subspace-based multidimensional sparse (SMDS) model under the umbrella of tensor notation. After that, the model is unfolded into an end-to-end network named SMDS-Net, whose fundamental modules are seamlessly connected with the denoising procedure and optimization of the SMDS model. This makes SMDS-Net convey clear physical meanings, i.e., learning the low-rankness and sparsity of HSIs. Finally, all key variables are obtained by discriminative training. Extensive experiments and comprehensive analysis on synthetic and real-world HSIs confirm the strong denoising ability, strong learning capability, promising generalization ability, and high interpretability of SMDS-Net against the state-of-the-art HSI denoising methods. The source code and data of this article will be made publicly available at https://github.com/bearshng/smds-net for reproducible research.
Article
Deep learning has found successful applications in restoration of 2D images including denoising, dehazing, superresolution and so on. However, existing DCNN architecture cannot fully exploit spatial-spectral correlations in 3D hyperspectral images (directly extending 2D DCNN into 3D will significantly increase computational complexity); meantime, unlike 2D images, there is an obstacle caused by the shortage of training data for hyperspectral images. To meet those challenges, we present a novel deep learning framework for 3D hyperspectral image denoising with the following contributions. First, inspired by the success of U-net in low-dose CT denoising, we propose a novel approach of encoding rich multi-scale information of hyperspectral images by a modified 3D U-net. Second, we present a computationally efficient implementation of 3D U-net based on the strategy of separable filtering. By decomposing 3D filtering into 2D spatial filtering and 1D spectral filtering, we can achieve substantial savings on the number of network parameters to keep computational complexity low. Third, we have developed a transfer learning approach of synthetically generating hyperspectral images from RGB images as supplementary training data. The synthesized hyperspectral images are used for initial training of modified 3D U-net denoising network which will be fine-tuned on real HSI images. Experimental results have shown that the proposed 3D U-net denoising method significantly outperforms existing model-based HSI denoising methods.
Article
Hyperspectral image (HSI) denoising is a crucial preprocessing procedure to improve the performance of the subsequent HSI interpretation and applications. In this paper, a novel deep learning-based method for this task is proposed, by learning a nonlinear end-to-end mapping between the noisy and clean HSIs with a combined spatial-spectral deep convolutional neural network (HSID-CNN). Both the spatial and spectral information are simultaneously assigned to the proposed network. In addition, multiscale feature extraction and multilevel feature representation are, respectively, employed to capture both the multiscale spatial-spectral feature and fuse different feature representations for the final restoration. The simulated and real-data experiments demonstrate that the proposed HSID-CNN outperforms many of the mainstream methods in both the quantitative evaluation indexes, visual effects, and HSI classification accuracy.
Article
The spectral and the spatial information in hyperspectral images (HSIs) are the two sides of the same coin. How to jointly model them is the key issue for HSIs' noise removal, including random noise, structural stripe noise, and dead pixels/lines. In this paper, we introduce the deep convolutional neural network (CNN) to achieve this goal. The learned filters can well extract the spatial information within their local receptive filed. Meanwhile, the spectral correlation can be depicted by the multiple channels of the learned 2-D filters, namely, the number of filters in each layer. The consequent advantages of our CNN-based HSI denoising method (HSI-DeNet) over previous methods are threefold. First, the proposed HSI-DeNet can be regarded as a tensor-based method by directly learning the filters in each layer without damaging the spectral-spatial structures. Second, the HSI-DeNet can simultaneously accommodate various kinds of noise in HSIs. Moreover, our method is flexible for both single image and multiple images by slightly modifying the channels of the filters in the first and last layers. Last but not least, our method is extremely fast in the testing phase, which makes it more practical for real application. The proposed HSI-DeNet is extensively evaluated on several HSIs, and outperforms the state-of-the-art HSI-DeNets in terms of both speed and performance. IEEE
Article
In this paper a self-paced learning-based probability subspace projection (SL-PSP) method is proposed for hyperspectral image classification. First, a probability label is assigned for each pixel, and a risk is assigned for each labeled pixel. Then, two regularizers are developed from a self-paced maximum margin and a probability label graph, respectively. The first regularizer can increase the discriminant ability of features by gradually involving the most confident pixels into the projection to simultaneously push away heterogeneous neighbors and pull inhomogeneous neighbors. The second regularizer adopts a relaxed clustering assumption to make avail of unlabeled samples, thus accurately revealing the affinity between mixed pixels and achieving accurate classification with very few labeled samples. Several hyperspectral data sets are used to verify the effectiveness of SL-PSP, and the experimental results show that it can achieve the state-of-the-art results in terms of accuracy and stability.