ArticlePDF Available

PMDRnet: A Progressive Multiscale Deformable Residual Network for Multi-Image Super-Resolution of AMSR2 Arctic Sea Ice Images

Authors:

Abstract and Figures

The extent of the area covered by polar sea ice is an important indicator of global climate change. Continuous monitoring of Arctic sea ice concentration (SIC) primarily relies on passive microwave images. However, passive microwave images have coarse spatial resolution, resulting in SIC production with significant blurring at the ice–water divides. In this article, a novel multi-image super-resolution (MISR) network called progressive multiscale deformable residual network (PMDRnet) is proposed to improve the spatial resolution of sea ice passive microwave images according to the characteristics of both passive microwave images and sea ice motions. To achieve image alignment with complex and large Arctic sea ice motions, we design a novel alignment module that includes a progressive alignment strategy and a multiscale deformable convolution alignment unit. In addition, the temporal attention mechanism is used to adaptively fuse the effective spatiotemporal information across image sequence. The sea ice-related loss function is designed to provide more detailed sea ice information of the network to improve super-resolution performance and further benefit finer Arctic SIC results. Experimental results demonstrate that PMDRnet significantly outperforms the current state-of-the-art MISR methods and can generate super-resolved SIC products with finer texture features and much sharper sea ice edges. The code and datasets of PMDRnet are available at https://doi.org/10.5061/dryad.k3j9kd590 .
Content may be subject to copyright.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022 4304118
PMDRnet: A Progressive Multiscale Deformable
Residual Network for Multi-Image
Super-Resolution of AMSR2 Arctic
Sea Ice Images
Xiaomin Liu , Tiantian Feng , Xiaofan Shen, and Rongxing Li ,Senior Member, IEEE
Abstract The extent of the area covered by polar sea ice
is an important indicator of global climate change. Continuous
monitoring of Arctic sea ice concentration (SIC) primarily
relies on passive microwave images. However, passive microwave
images have coarse spatial resolution, resulting in SIC production
with significant blurring at the ice–water divides. In this article,
a novel multi-image super-resolution (MISR) network called
progressive multiscale deformable residual network (PMDRnet)
is proposed to improve the spatial resolution of sea ice pas-
sive microwave images according to the characteristics of both
passive microwave images and sea ice motions. To achieve
image alignment with complex and large Arctic sea ice motions,
we design a novel alignment module that includes a progressive
alignment strategy and a multiscale deformable convolution
alignment unit. In addition, the temporal attention mechanism is
used to adaptively fuse the effective spatiotemporal information
across image sequence. The sea ice-related loss function is
designed to provide more detailed sea ice information of the
network to improve super-resolution performance and further
benefit finer Arctic SIC results. Experimental results demonstrate
that PMDRnet significantly outperforms the current state-of-
the-art MISR methods and can generate super-resolved SIC
products with finer texture features and much sharper sea ice
edges. The code and datasets of PMDRnet are available at
https://doi.org/10.5061/dryad.k3j9kd590.
Index Terms—Arctic sea ice, deep learning (DL), deformable
convolution (DConv), multi-image super-resolution (MISR), pas-
sive microwave image, temporal attention.
I. INTRODUCTION
POLAR sea ice is a critical parameter of cryosphere and
polar environment changes and is also of great signif-
icance for ensuring the safety of polar shipping and other
maritime activities [1], [2]. Satellite observations in the past
four decades show that the Arctic sea ice extent has rapidly
decreased and that the Arctic sea ice thickness is gradually
Manuscript received October 26, 2021; revised January 29, 2022; accepted
February 1, 2022. Date of publication February 15, 2022; date of current
version March 24, 2022. This work was supported in part by the National Key
Research and Development Program of China under Grant 2017YFA0603100
and in part by the National Science Foundation of China under Grant
41801335 and Grant 41941006. (Corresponding author: Tiantian Feng.)
The authors are with the Center for Spatial Information Science
and Sustainable Development Applications, College of Surveying and
Geo-Informatics, Tongji University, Shanghai 200092, China (e-mail:
14xiaomin@tongji.edu.cn; fengtiantian@tongji.edu.cn; lunashen0920@
foxmail.com; rli@tongji.edu.cn).
Digital Object Identifier 10.1109/TGRS.2022.3151623
thinning [3], [4], which means that the Arctic barrier against
global warming is gradually weakening. Sea ice concentration
(SIC) is the percentage of sea ice in a given ocean area and
is one of the most intuitive parameters that indicates changes
in sea ice because several sea ice parameters, such as sea ice
area, can be derived from SIC, and some important sea ice fea-
tures, such as polynyas and leads, can be identified according
to SIC [5].
SIC can be calculated based on different types of remote
sensing images and corresponding image processing methods,
such as pixel-level classification results oriented from synthetic
aperture radar (SAR) images [6], [7], the difference analysis
of albedo and surface temperature between sea water and sea
ice oriented from optical and thermal infrared images [8], [9],
and the differences in brightness temperature at different
channels between sea ice and sea water oriented from passive
microwave images [10]–[12]. Among these methods, although
SAR images and the optical and thermal infrared images have
high spatial resolution, there are some limitations, such as
small coverage for SAR images due to the small coverage
area and long revisit period and low temporal sampling for
optical and thermal infrared images due to the occlusion by
clouds. Conversely, passive microwave images can achieve
near-complete daily observation of the entire Arctic with the
advantages of strong surface penetration, all-weather opera-
tion, wide coverage, and high temporal resolution, becoming
an important data source for the continuous monitoring of
Arctic SIC [13]. However, limited by the relatively coarse
spatial resolution of passive microwave images, the Arctic SIC
products oriented from passive microwave images have large
blurring at the ice–water divides. These blurring effects have
prevented finer sea ice dynamic monitoring, such as the change
of sea ice edge, the evolution process of polynyas, and the
observation of narrow lead structures [5], [14]. For example,
according to a model study of the atmosphere–ocean heat
exchange caused by the opening of leads, a 1% underestimate
of SIC can overestimate the surface air temperature by almost
3.5 K; hence, resolving narrow leads is of great importance
but limited to coarse SIC [15]. Therefore, if the spatial
resolution of passive microwave images can be improved,
the blurring effects at the ice–water divides of SIC can be
reduced, which is important to improve understanding of the
1558-0644 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
evolution of Arctic sea ice and benefit fine-scale applications
of SIC.
The improvement of spatial resolution is also known as
image super-resolution (SR) [16]. Traditional SR methods
improve the spatial resolution of passive microwave images via
linear combination of observations of brightness temperature
in the overlap areas of footprints based on the antenna pattern
of the radiometer [17]. For example, the Backus–Gilbert (BG)
method [18], [19] and the radiometer form of the scatter-
meter image reconstruction (rSIR) method [20] have success-
fully improved the spatial resolution of passive microwave
images by two to four times, including Scanning Multi-
channel Microwave Radiometer (SMMR), Special Sensor
Microwave/Imager (SSM/I), and Advanced Microwave Scan-
ning Radiometer for the Earth Observing System (AMSR-E)
images [21], [22]. Prior knowledge of the radiometer antenna
pattern characteristics is critical to SR results. However, the
exact radiometer antenna pattern is difficult to obtain. It is
more often approximated by mathematical model simulations,
such as Gaussian functions, which introduce errors and result
in the distortion of reversely recovering high-resolution (HR)
images.
With the rapid development of deep learning (DL) tech-
nology in recent years, DL-based SR (DL-SR) techniques
have been widely used and developed [23], which can adap-
tively learn the degradation relationship between HR image
and low-resolution (LR) image pairs in an end-to-end way
without any prior knowledge or assumptions. According to the
number of input LR images, DL-SR methods can be divided
into two categories: DL-based single-image SR (DL-SISR)
and DL-based multi-image SR (DL-MISR) [24]. DL-SISR
obtains the HR version of the input degraded LR image
by establishing the mapping relationship between a single
LR image and the corresponding HR image. The DL-MISR
technique can convert the LR image sequence of the same
scene into the corresponding HR image, which can make use
of the complementary information between multitemporal LR
images, including intraimage spatial relations and interimage
temporal relations. At present, there are lots of DL-MISR
algorithms designed for natural (RGB) images, and several
DL-MISR algorithms designed for HR optical satellite images
with single-channel input. However, there is a lack of a
DL-MISR algorithm considering the characteristics of passive
microwave images in the Arctic sea ice region. First, the spatial
resolution of passive microwave images is much lower than
natural images and HR optical satellite images, making feature
extraction more difficult. Second, passive microwave images
have multichannel measurements at multiple frequencies and
multiple polarizations, and sea ice features are difficult to
be inversed from single-channel passive microwave images,
resulting in the lack of sufficient sea ice information in the
single-channel input network, which requires the combination
of multichannel passive microwave images as input data.
Third, there are primary various sea ice motion scenes in
Arctic region, which are much different from those in low and
midlatitude. For example, there are complex sea ice motions
even with geometric changes, caused by sea ice dynamic
processes (e.g., ridging, formation of leads, fragmentation,
aggregation, etc.) and thermodynamic processes (e.g., rapid
melting in summer, rapid freezing in winter, etc.), making
the accurate alignment more challenging, in particular large
motions. Finally, the occlusion of atmospheric cloud liquid
water and water vapor for passive microwave high-frequency
(e.g., 89 GHz) images may provide misleading information
for SR.
In this article, according to the characteristics of both pas-
sive microwave images and sea ice motions, a novel DL-MISR
algorithm is proposed to improve the spatial resolution of
sea ice passive microwave images. The performance of the
proposed algorithm is compared to the state-of-the-art SR
algorithms. To verify the effectiveness of the proposed SR
method, the fine-scale Arctic SIC oriented from SR results is
also compared to that oriented from HR optical images [i.e.,
Moderate Resolution Imaging Spectroradiometer (MODIS)
images].
The primary contributions of this article can be summarized
as follows.
1) A progressive multiscale deformable residual net-
work (PMDRnet) based on the deep residual convo-
lutional network for Arctic sea ice passive microwave
images is designed. The SR performance of PMDRnet
is shown to outperform those of other state-of-the-art
methods.
2) To manage complex and large Arctic sea ice
motions, a progressive alignment strategy and multiscale
deformable convolution (DConv) alignment unit are
designed. In addition, the adaptive fusion of multitem-
poral aligned features is achieved using the temporal
attention mechanism in the network.
3) To improve SR performance and to achieve better inver-
sion results with Arctic SIC, the sea ice-related loss
function is designed based on the polarization difference
of the brightness temperatures at multichannel AMSR2
images.
The remainder of this article is organized as follows.
Section II briefly introduces the related work on DL-MISR.
Section III describes the data used in this study. Section IV
provides a detailed description of the proposed DL-MISR
framework for passive microwave data. Section V presents the
experimental results and discussions. Conclusions are drawn
in Section VI.
II. RELATED WORK
Generally, DL-MISR algorithms include alignment, fusion,
and reconstruction processes. Based on DL-SISR models, DL-
MISR models primarily focus on how to make full use of
the spatial–temporal correlation across image sequence (i.e.,
alignment for establishing accurate correspondences between
the neighboring images and the reference image and fusion
for effectively aggregating spatial–temporal information from
aligned images for reconstruction) [25]. Most DL-MISR
models perform alignment for further interimage information
acquisition by explicitly estimating the optical flow fields
between the reference image and its neighboring images and
then warping neighboring images using the estimated motion
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
fields, such as VSRnet [26], VESPCN [27], FRVSR [28],
TecoGAN [29], RBPN [30], and SOFVSR [31]. However,
optical flow-based alignment methods have difficulty obtain-
ing accurate motion estimation and warping, particularly for
image sequences with various large motions [32]–[34]. Such
inaccuracies of motion estimation and motion compensation
lead to distortions or errors in SR results and degrade SR
performance.
Instead of performing alignment with explicitly comput-
ing and compensating for motion between input images,
DUF [32] implicitly used the motion information to generate
dynamic upsampling filters, which is computed depending
on the local spatial–temporal information learned by 3-D
convolution to avoid explicit motion compensation. With the
unique advantage of recurrent neural networks (RNNs) that
have powerful capabilities in processing time-series images,
RNN-based MISR [35]–[37] is designed to acquire spatiotem-
poral information without explicit alignment, which is suitable
for modeling motions without subtle and significant changes
because recurrent feedback connections use temporal smooth-
ness across image sequence to improve the performance [30].
Recently, based on DConv [38], [39], TDAN [33] proposed a
temporally deformable alignment network to adaptively align
the reference image and the neighboring images at the feature
level without explicit motion estimation or image warping,
which dynamically predicts the offset of the sampling convolu-
tion kernel of the reference image and neighboring images, and
would accommodate motions with geometric changes. Inspired
by TDAN, EDVR [34] uses DConv to perform alignment with
a pyramid structure to align from coarse to fine to manage
large and complex motions.
Another critical part of DL-MISR models is the fusion
process; in particular, there are occlusions and/or inaccurate
alignments. The easiest strategy directly concatenates the
aligned image sequence and performs fusion via a convo-
lution layer [27], [32], [33]. The RNN-based MISR frame-
work feeds all temporal images and gradually fuses multiple
images [35], [36]. Instead of processing many images simul-
taneously in the network, RBPN [30] treats each image as
a separate source of information, which is combined in an
iterative refinement framework. The model in [40] proposed
adaptive dynamic fusion on different time scales to fully
consider temporal dependency. Considering the difference in
temporal–spatial characteristics of input image sequences, the
temporal–spatial attention mechanism in [34] is used to allow
different emphases on different temporal and spatial locations.
Similarly, TGA [41] built image groups based on different time
intervals first and then designed temporal group attention to
borrow complementary information adaptively from intragroup
and intergroup.
To date, DL-MISR methods have been shown to be effective
in the application of natural images, and an attempt has been
made to apply DL-MISR methods to remote sensing satellite
images. Also, the European Space Agency (ESA) MISR chal-
lenge was held against on multitemporal PROBA-V satellite
images, including RED and near infrared response (NIR)
spectral bands. For example, DeepSUM [24] divides the
MISR process into three single convolutional neural networks
(CNNs) and then uses spatial–temporal correlation to obtain
HR images from unregistered multitemporal satellite images.
In HighRes-net [42], an end-to-end deep neural network has
been proposed to align and fuse LR images in a recursive
way. RAMS [43] introduced feature and temporal attention
mechanisms with 3-D convolutions to super-resolve remotely
sensed multitemporal LR images. In addition, Xiao et al. [44]
proposed a DL network for Jilin-1 satellite video SR using
multiscale DConv alignment and temporal grouping projec-
tion. However, these methods were specially designed for SR
of HR optical satellite images.
Motivated by the advantages of these methods, we propose
a novel alignment strategy and improved DConv as a basic
operation for alignment and employ an attention mechanism
in the fusion module, as well as design sea ice-related loss
function based on multichannel AMSR2 images, all within an
end-to-end trainable residual network, to effectively enhance
the spatial resolution of passive microwave images and further
achieve finer monitoring of the Arctic SIC.
III. DATA
AMSR2 is onboard the Global Change Observation Mis-
sion 1st-Water (GCOM-W1) satellite launched by the Japan
Aerospace Exploration Agency (JAXA), which is designed to
provide passive microwave data with the highest spatial resolu-
tion of earth observations to estimate a variety of geophysical
parameters, particularly those connected to water, such as
total precipitation water, cloud liquid water, and SIC [45].
The antenna of AMSR2 rotates once every 1.5 s and obtains
data over a 1450-km swath. This conical scan mechanism
allows AMSR2 to acquire a set of daytime and nighttime
data with more than 99% coverage of the earth every two
days [45]. There are seven observation frequencies of AMSR2,
including 6.925-, 7.3-, 10.65-, 18.7-, 23.8-, 36.5-, and 89-GHz
channels, as shown in Table I. In particular, the AMSR2 swath
data contain two independent swaths for the 89-GHz channels
(the A and B scans), which are acquired by two independent
antennas. Both A and B scans have an along-scan sampling
interval of 5 km and an along-track sampling interval of 10 km.
In addition, A and B scans are shifted along track by 5 km
(or 15 km), which can be interleaved to obtain 5-km along
track, resulting in swaths with 5-km scan line spacing and
footprints in each scan line spaced 5 km apart [46]. Therefore,
the AMSR2 swath data at 89 GHz have the highest spatial
resolution among all observation frequencies, and with the
Arctic Radiation and Turbulence Interaction Study (ARTIST)
Sea Ice (ASI) algorithm [12], the AMSR2-measured SIC can
be calculated by the value of the polarization difference of
the brightness temperatures at 89-GHz channels, which benefit
from the high spatial resolution of the 89-GHz channels and
do not need additional data sources as an input to achieve
performance that is similar to other SIC algorithms.
In this article, the level 1B brightness temperature swath
data of horizontal polarization and vertical polarization at
89-GHz channels are used. For better data preprocessing
and feature extraction, all swath AMSR2 data at 89-GHz
channels of one day are gridded to obtain daily average
passive microwave brightness temperature images with the
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
TAB L E I
MAIN SPECIFICATIONS OF AMSR2
polar stereographic grids of the National Snow and Ice Data
Center (NSIDC) [47], which are used to create the training and
test set of the network. To generate HR images and products
that can be embedded and reduce errors during the gridding
process, the grid resolution of 6.25 km is chosen. The coverage
of the AMSR2 image is drawn in Fig. 1, including the entire
Arctic region. The size of each image is 1060 ×1060 pixels
and clipped into 28 HR patches of size 256 ×256 pixels in an
overlapping manner. All the LR patches are obtained by using
bicubic interpolation with a scaling factor of 4 to downsample
the HR patches, and the LR patches are of with a spatial
size 64 ×64. The images have a dual-channel with a bit-
depth of 16 bits, including horizontal polarization and vertical
polarization at 89-GHz channels. The image acquisition time
range is from 2013 to 2019, where the high-quality images are
selected after discarding the images with missing or incorrect
information. There are a total of 33 600 image sequences,
of which 30 660 are for training and 2940 are for testing. Also,
we use data augmentation technology by rotation and flipping
to expand the training set to improve model generalizability.
To demonstrate the effect of the alignment and fusion
module, as well as the applicability of the proposed model
in Arctic SIC, a MODIS image sequence in Baffin Bay is
used, which basically corresponds to the AMSR2 swath image
acquisition time and is acquired under primarily clear sky
conditions. In particular, the reflectance data from band 1
(620–672 nm) with a spatial resolution of 250 m are used,
preprocessed by ENVI 5.3 software. The extent of the MODIS
data is shown as Region 1 in Fig. 1, which encloses the three
selected subregions A–C.
IV. METHODS
A. Network Architecture
The motivation of DL-MISR is to obtain the target HR
image IHR
tby reconstructing 2n+1LRimagesILR
[tn:t+n]
of the same scene with complementary spatiotemporal infor-
mation, where the middle image ILR
tis chosen as the ref-
erence image to be super-resolved and the other images
{ILR
tn,...,ILR
t1,ILR
t+1,...,ILR
t+n}are chosen as neighboring
images to provide supporting information for the reference
image. The MISR process is to reverse the process of the target
Fig. 1. Areas of AMSR2 data used in the study. The gray area is land
mask and missing data, the blue area is sea water, and the light blue area
is sea ice, where SIC is above 15% on May 15, 2019, downloaded from
the University of Bremen, Bremen, Germany, at https://seaice.uni-bremen.de.
The red rectangular regions 1–3 enclose the three selected areas covered with
AMSR2 images to compare PMDRnet with other state-of-the-art methods.
Region 1 encloses the selected three black rectangle subregions A–C that
are covered with MODIS images and AMSR2 images for qualitative and
quantitative comparison of SIC products oriented from SR images.
HR image degradation, which can be formulated as follows:
IHR
t=fILR
[tn:t+n];θ(1)
where fdenotes the MISR process representing the mapping
relationship between ILR
[tn:t+n]and IHR
tand θdenotes the
weight parameters. The loss function Lis designed to measure
reconstruction errors between the predicted HR image ISR
tand
the target HR image IHR . The optimal value θis trained by
the network with the guidance of loss function Land large
amounts of training data through the following equation:
θ=arg min
θLIHR,ISR .(2)
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
Fig. 2. Overview of the proposed DL-MISR framework. In the framework, we take five images as an image sequence (n=2). The input and output of
the network are dual-channel data, including the horizontal polarization images (images with blue borders) and vertical polarization images (images with red
borders) at 89-GHz channels.
For the network design of the proposed PMDRnet, the
global residual learning strategy [48] is used to learn the
residuals of missing high-frequency details between the input
LR image sequence and the corresponding HR image. The
structure of the network is shown in Fig. 2 and includes three
primary subnetworks: the progressive multiscale deformable
alignment network, adaptive fusion network, and SR recon-
struction network. First, the features of the input LR image
sequence, including the horizontal polarization and vertical
polarization images at 89-GHz channels, are extracted, and
then, the features of neighboring images are aligned to the
reference image features via the alignment network. The
aligned feature sequence is merged into a single feature map
with complementary spatial–temporal information in the next
adaptive fusion network. The SR reconstruction network is
used to predict deep features of the fused features and perform
the upsampling operation. Finally, the target HR image is
obtained by adding the residual information predicted by the
network to the upsampled LR reference image.
B. Progressive Multiscale Deformable Alignment Network
In the Arctic region, sea ice motions may have various mag-
nitudes, from small motions within a pixel to large motions
up to several pixels. There still remain difficult to achieve
accurate alignment for complex and large motions with geo-
metric changes. To manage these problems and improve the
performance of image alignment in the case of sea ice motions,
a progressive multiscale deformable alignment network that
combines the progressive alignment strategy and multiscale
DConv alignment unit is proposed to achieve progressive
alignment in a cascaded manner to refine the final alignment
results.
1) Progressive Alignment Strategy: Compared with the
previous alignment strategy, which aligns all neighboring
images to a fixed reference image (usually the middle image),
we achieve progressive alignment with a dynamic reference
image and cascading operation, as shown in Fig. 2. To com-
plete the alignment at the feature level, feature extraction is
first performed on the input image sequence, consisting of one
convolutional layer and cascaded residual blocks described
for details in Section IV-D, whose output are denoted as
F0
[tn:t+n]. The neighboring image to be aligned, which is
aligned by the alignment unit, will be used as the reference
image of the next neighboring image to be aligned, and the
reference image of the alignment operation per cascade begins
from the middle image. For example, in the alignment of the
first cascade, the closest to middle images F0
t1and F0
t+1is
first aligned to the middle image F0
t, denoted as F1
t1and
F1
t+1, respectively. Then, the aligned images F1
t1and F1
t+1are
used as the reference images of the next two images F0
t2and
F0
t+2, respectively. The aligned images are denoted as F1
t2and
F1
t+2. The alignment operation continues until all images in the
sequence are aligned. In addition, the aligned image sequence
can be used as an input to continue the alignment operation of
the next cascade to achieve progressive alignment and produce
the refined alignment results Fa
[tn:t+n],whereadenotes the
size of the cascade. This progressive alignment strategy is
equivalent to decomposing the complex and large motions in
the sequence into every two neighboring images per cascade.
This strategy is much easier to perform alignment operations
when considering only the motion of every two neighboring
images per cascade, which is helpful when managing complex
and large sea ice motions, and improves alignment accuracy.
All images are actually aligned indirectly to the middle target
image features Fa
t.
2) Multiscale DConv Alignment Unit: The alignment unit
is designed based on DConv and dilated convolution for com-
plex, deformed, and large sea ice motions in the Arctic. DConv
can augment the spatial sampling locations with learned off-
sets, enhancing the geometric transformation modeling capa-
bility of the original CNNs limited to the fixed size for all
the kernels in a layer [38]. In addition, the DConv alignment
uses multiple offsets at each feature location, which can obtain
information from its local neighborhood and complement
each other, resulting in more accurate alignment and better
performance than a single offset, particularly for the reduction
of warping errors caused by large motions [49]. Therefore, the
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 3. Multiscale DConv alignment unit for reference image features Fref and image features to be aligned Funal.
DConv may accommodate motions with geometric changes
and can be used to perform precise alignment with multiple
complementary offsets in an implicit manner.
For a standard convolution in CNNs, the k-th regular sam-
pling location of an N×Nconvolution kernel is denoted as:
Pk,k∈[1:N2]and Pk∈{(1,1), (1,0),...,(1,1)},k
[1:9]when N=3. When aligning the unaligned image
features Funal to the reference image features Fref, the DConv
operation can be formulated mathematically as (3) for each
position Pon the output aligned image Fal,whereOkis
denoted as the learned offset, ωkis the weight, and mkis
the modulation coefficient
Fal(P)=
N2
k=1
ωk·Funal(P+Pk+Ok)·mk.(3)
However, the performance of the DConv alignment method
may still be limited when managing motions on the same scale
across sequences because the receptive field is finite. When
estimating the offsets using the same small receptive field,
it is difficult to fully use the characteristics of each input and
capture useful context information for obtaining the optimal
offset estimation. In this article, we proposed the improved
DConv alignment unit based on the fact that dilated convolu-
tions can expand the receptive field in offset estimation without
loss of information and the same calculation conditions [50],
which can aggregate multiscale offset information and benefit
the detection of various types of motions, particularly large
motions.
The proposed multiscale DConv alignment unit for multi-
scale offset estimation is shown in Fig. 3. Before estimating the
offsets, the difference between the unaligned image features
Funal and the reference image features Fref is first calculated
using a concatenation operation and a 3 ×3 convolution
layer, followed by a leaky rectified linear unit (ReLU) acti-
vation function. Then, the different scale offsets Om
Funalref are
estimated on each independent branch by designing different
dilation rates m, which are integrated together via the concate-
nation operation and a 1 ×1 convolution layer followed by the
leaky ReLU activation function to obtain the multiscale offset
Ofused
Funalref as (4), where [,]and Conv are the concatenation and
convolution operations, respectively
Ofused
Funalref =ConvO1
Funalref,O2
Funalref ,...,Om
Funalref .(4)
Finally, with the multiscale offsets Ofused
Funalref and the unaligned
image features Funal, the aligned image features Fal are pro-
duced by a 3 ×3 DConv layer followed by a leaky ReLU
activation function as follows:
Fal =DConvFunal,Ofused
Funalref.(5)
C. Adaptive Fusion Module
When fusing aligned features, there are some conditions that
must be considered to efficiently fuse the dominant features
between sequential images. There may still be inaccurate
alignment results from the alignment network. Another thing
to consider is occlusion, such as the influence of atmospheric
cloud liquid water and water vapor for passive microwave
high-frequency (e.g., 89 GHz) images. Inspired by [34], [41],
and [44], the attention mechanism is used in the fusion module
for efficient feature fusion, which can guide the model to
obtain the fused features with only beneficial complementary
spatiotemporal information.
To achieve adaptive fusion, it is critical to obtain the
information similarity of each image in the sequence (i.e., the
temporal attention map Wt+i,i∈[n,n]). The more similar
the information between the images in the sequence and the
target image, the more attention will be paid. As shown in
Fig. 4, a 3 ×3 convolutional layer followed by a leaky ReLU
activation function is first applied on each aligned image Fa
t+i,
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
Fig. 4. Adaptive fusion module with temporal attention mechanism when
there are three images in an image sequence (n=1).
and then, the concatenation operation is performed between
each image in the sequence with the target image. After the dot
product operation, information similarity is computed using
the tanh function [51] to each position across image sequence
along the temporal axis as follows:
Wt+i=tanhConvFa
t+iTConvFa
t.(6)
Then, the attention weighted features for each image ˜
Ft+i
are calculated by multiplying the attention maps and the
corresponding original aligned features as (7), where is the
multiplication operation
˜
Ft+i=Wt+iFa
t+i.(7)
The last step is to fuse the attention weighted features
of each image sufficiently via concatenation operation and a
3×3 convolutional layer followed by a leaky ReLU activation
function to obtain the fused features Ffused as the following
function:
Ffused =Conv˜
Ftn,..., ˜
Ft1,˜
Ft,˜
Ft+1,..., ˜
Ft+n.(8)
D. SR Reconstruction Network
To predict deep features within the fused features, local
residual learning is used to perform nonlinear mapping, which
is composed of cascaded residual blocks. Different from the
initial residual block [52], we use the residual block without
batch normalization (BN) proposed by Lim et al. [53], which
consists of two 2-D convolution layers and an ReLU activation
function [54] in the middle and is implemented by shortcut
connection. The BN will tend to lose the scale information of
the image and reduce the range flexibility of networks, and
other models also confirm that removing the BN is helpful to
develop a much larger model and increase the SR performance.
Generally, the feature learning ability will be enhanced when
more residual blocks are cascaded [52], but model complexity
and computational cost will increase, which is a tradeoff
between performance and speed.
The subpixel convolution layer [55] is used for upsampling
and belongs to an end-to-end learnable upsampling layer,
which obtains multiple features using the convolution opera-
tion on the LR features and then completes the reconstruction
of the HR image through periodic filtering. In addition, the
upsampling layer is placed at the end of the network model
to avoid a large number of calculations in high-dimensional
space.
E. Loss Function
1) Multiple Loss Function Terms: To obtain better SR
performance and further improve the result of Arctic SIC,
we design a loss function that consists of the SR loss
term Lsr and the sea ice-related loss term Lice based on
the polarization difference of the brightness temperatures at
multichannel AMSR2 images. Lsr is used to evaluate the pixel-
level reconstruction error between the predicted image and
the target image, including the horizontal polarization and
vertical polarization at 89-GHz channels, which are denoted
as ISR
89h and IHR
89h and ISR
89v and IHR
89v .Lice is used to calculate
the error between the predicted polarization difference by
subtracting ISR
89h from ISR
89v and the target polarization difference
by subtracting IHR
89h from IHR
89v , which are denoted as PSR
89 and
PHR
89 , and can intensify the sea ice information in the proposed
network. Therefore, the network designed a dual-channel data
input to build the multifrequency features of sea ice and the
final loss function Lwith the weight λ, λ ∈[0,1]of multiple
loss function items as follows:
L=λLice +(1λ)Lsr (9)
where the Charbonnier loss, which is a type of pixel loss,
is used to calculate Lice and Lsr as (10) and (11), which can
manage outliers well and has the advantages of robustness, fast
convergence, and good SR performance [56], [57]. In addition,
minimizing the pixel loss directly maximizes the peak signal-
to-noise ratio (PSNR), which is one of the most widely
used evaluation criteria of reconstruction quality and is highly
correlated with pixel-wise differences [58]
Lice =1
rwh
r
1
w
1
h
1PSR
89 PHR
89 2+ε2(10)
Lsr =1
rwh
r
1
w
1
h
1ISR
89v IHR
89v 2+ε2
+ISR
89h IHR
89h 2+ε2(11)
where ris the number of training samples, hand ware the
height and width of the image to be evaluated, respectively,
and εis a constant for numerical stability (e.g., 103).
2) Bayesian Optimization Method: As one of the hyperpa-
rameters, the optimal value λis confirmed by the Bayesian
optimization (BO) algorithm [59] to automatically weight
multiple loss function items. The BO method assumes that
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
there is a functional relationship between the hyperparameter
λand the objective function gwe must optimize, and we must
find the optimal hyperparameter value λas follows:
λ=arg min
λ
g(λ). (12)
First, it is necessary to design the objective function g,
which is the reconstruction error of the model using the
hyperparameters λon the test set, including the bright-
ness temperature images of horizontal polarization and ver-
tical polarization at 89-GHz channels. The mean square
error (MSE) is used to measure this error. Based on the previ-
ous dataset, Dconsists of a number of arrays represented by
{1,y1), 2,y2),...,(λ
k,yk)},whereyis the value of MSE
and kis the number of iterations. The posterior probability
distribution pthrough Gaussian regression Gis established,
and we can obtain the mean and variance at each value of
λ. By constructing the acquisition function Fwith the mean
and variance, the next hyperparameter value λk+1is selected,
where the upper confidence bound (UCB) method [60] is
used as the acquisition function. Finally, the dataset D(λ, y)
is updated by adding the new k+1,yk+1), and the iterative
process continues until convergence.
V. E XPERIMENTAL RESULTS AND DISCUSSION
In this section, the optimal hyperparameter λin the loss
function of the proposed PMDRnet is determined using the
BO algorithm. Then, an experimental evaluation of PMDRnet
is performed and compared with several representative algo-
rithms, including the traditional method selected as the base-
line (e.g., bicubic), the state-of-the-art DL-MISR methods for
RGB images (e.g., DUF [32], RBPN [30], EDVR [34], and
RSDN [37]), and the representative DL-MISR method for
remote sensing images (e.g., RAMS [43]). In addition, an
ablation study is performed to highlight the contribution given
by the novel alignment module and attention-based fusion
module to the overall network performance. Finally, we apply
the proposed network to the Arctic SIC and further assess the
SR performance.
A. Parameter Settings
Referring to the basic framework of [34], PMDRnet uses
five residual blocks to perform feature extraction, followed
by a reconstruction module with ten residual blocks, and
the cascade size of the above alignment operation is set
to 3. The number of filters in each layer is 64, and we use
reflection padding in all convolution layers to mitigate border
effects. Because the proposed network is a generic framework
suitable for processing image sequences with different lengths
and different upscaling factors, we select five images as an
input (n=2) and a 4×upscaling factor in the following
experiments. To improve the convergence speed and prevent
the input of different scales from affecting the final result, the
input data of the network will be normalized preprocessing,
and the data range will be normalized to [0,1].
The proposed network is implemented on the PyTorch
framework and trained using four NVIDIA GTX1080Ti GPUs.
The parameters of the network are optimized using the Adam
optimizer with β1=0.9andβ2=0.999, and the learning rate
is set to 5 ×104.
B. Evaluation Criterion
To evaluate the SR performance of the proposed network,
two popular metrics for reconstruction quality measurement
are used, including PSNR and the structural similarity (SSIM)
index, which, respectively, measure the similarity using the
mean-squared error and the structure consistency between the
predicted image ISR and the target image IHR, which are
defined as follows:
PSNR =10 log10MAX2
MSE(ISR,IHR )(13)
SSIM =(2ϕISR ϕIHR +C1)+(2σISR IHR +C2)
ϕ2
ISR +ϕ2
IHR +C1+σ2
ISR +σ2
IHR +C2(14)
where MAX represents the maximum brightness temperature
value in the test set, MSE(ISR,IHR )is the mean-squared
error between ISR and IHR,ϕISR and ϕIHR represent the mean
brightness temperature value of ISR and IHR ,σISR and σIHR
denote the standard deviations (STDs), and σISR IHR is the
covariance. C1and C2are constants and are typically set to
0.01 and 0.03, respectively, which are used to stabilize the
calculation.
C. Experimental Results
1) Weighting Loss Function Terms: The experimental
results of searching for the optimal proportion between mul-
tiple loss function items using the BO algorithm are shown
in Fig. 5. According to the trend of the iterated values, the
value of the objective function is significantly reduced at the
beginning when adding the sea ice-related loss function term,
and then, increasing the proportion of the sea ice-related loss
function term within limits can improve the performance of
the proposed network. However, the value of the objective
function will increase significantly when λ>0.8, indicating
that the proportion of the sea ice-related loss function term
does not yield better results when above a certain value. Based
on the BO algorithm, the optimal value λis concentrated
at approximately 0.3, and the value of the objective function
oscillates in a small range, as shown in the zoomed-in view
in the middle of the graph in Fig. 5. The maximum number
of iterations is set to 16, considering the balance between
the performance and computational costs. Finally, the optimal
value (λ=0.27)is determined in the experiment, as the opti-
mal value of the hyperparameter λis used in the subsequent
experiments.
2) Comparison to Other State-of-the-Art Methods: Tab le II
shows the quantitative results of the test set with a 4×sam-
pling factor. The proposed PMDRnet significantly outperforms
all the other state-of-the-art methods, with the highest PSNR
and SSIM for the test set consisting of brightness temperature
images of horizontal polarization and vertical polarization at
89-GHz channels, denoted as BT89h and BT89v , respectively.
Compared with the bicubic algorithm, the improvements of
PMDRnet are 2.91 dB and 0.0730 in terms of PSNR and
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
TAB L E I I
QUANTITATIVE COMPARISONRES ULTS WITH A SCALING FACTOR OF 4. PSNR AND SSIM ARE CALCULATED ON TEST DATA S ET .THE RESULT IN BOLD
SHOWS THE BEST PERFORMING PSNR AND SSIM
SSIM at BT89h and 2.99 dB and 0.0655 at BT89v , respectively.
In addition, PMDRnet outperforms the 3-D convolution-based
network DUF, flow-based network RBPN, DConv-based net-
work EDVR, RNN-based network RSDN, and remote sensing
image-based network RAMS on both PSNR and SSIM.
With the qualitative results, we primarily focus on the SR
image with various characteristics of sea ice, such as active
sea ice areas with a large amount of moving floating ice and
newly formed polynyas, as well as some stationary scenes,
such as the boundary between sea ice and land. AMSR2
SR images of vertical polarization at 89-GHz channels from
three nonoverlapping areas near Baffin Bay (Fig. 1, Region 1),
the Lincoln Sea (Fig. 1, Region 2), and the Beaufort Sea
(Fig. 1, Region 3) are chosen. The influence of weather on
passive microwave images is mainly found to be the effect
of atmospheric cloud liquid water and water vapor on the
brightness temperature images at 89 GHz [12]. We take water
vapor as an example and downloaded the water vapor products
of Regions 1–3 via https://seaice.uni-bremen.de/. The temporal
resolution of water vapor products is one day. In addition,
all available Sentinel 1 level 1 images in the three regions,
which were taken in a ground range detected medium (GRDM)
resolution and with an extra wide (EW) swath mode, are also
shown for comparison. The Sentinel 1 images have a pixel
spacing of 40 m and a resolution of 93 ×87 m, downloaded
via https://www.sentinel-hub.com/, and the acquisition dates
of both water vapor products and sentinel 1 images are the
same as the AMSR2 images.
Fig. 6 shows the active floating ice in the scene of
Region 1 and a zoomed-in view of the boundary between
sea ice and sea water for better observation in detail, and
the corresponding method and the quantitative results are dis-
played at the top of the image. Fig. 6 shows that the DL-MISR
methods perform markedly better than bicubic interpolation,
and PMDRnet produces superior SR results with the sharpest
sea ice edge information and results that are most similar to the
ground truth compared to other DL-MISR methods. Similarly,
from the SR results obtained on the newly formed polynyas
scene of Region 2 in Fig. 7, PMDRnet achieves the best spatial
reconstruction with fewer artifacts and distortions, which can
best characterize the correct shape and texture details of the
polynyas as well as partly smaller scale leads. For the scene
of Region 3 with the boundary between sea ice and land in
Fig. 8, PMDRnet can successfully recover abundant features
of edges, where other methods, such as DUF and RBPN, fail
and even generate the wrong texture and edge.
The average water vapor content of the regions in Figs. 6–8
is 8.41, 2.84, and 2.75 kg/m2, respectively. According to both
Fig. 5. Experimental results of searching for the optimal proportion between
multiple loss function items using the BO algorithm. Zoomed-in view in the
middle of the graph represents the range where the optimal value oscillates.
quantitative and qualitative comparison results in Figs. 6–8,
our PMDRnet shows the best SR performance compared with
other methods and even in region A (Fig. 6) with the largest
water vapor content.
In addition, all the SR methods increase the spatial resolu-
tion at the expense of an increased noise level inevitably [21].
With an upscaling factor of 4, both quantitative and qualitative
comparison results show that PMDRnet has the best overall
SR performance among the above methods and can perform
an optimal tradeoff between signal and noise.
3) Ablation Study: An ablation study is performed to verify
the effectiveness of the proposed alignment module, including
the progressive alignment strategy and multiscale DConv unit,
as well as the fusion module, on the overall network perfor-
mance. Results of the ablation study are tabulated in Table III.
Model 1 is the baseline, which performs an alignment opera-
tion with a fixed alignment strategy and original alignment unit
via three DConv and directly fuses the input image sequence
by a convolution layer. Based on Model 1, Model 2 uses the
progressive alignment strategy to replace the fixed alignment
strategy in Model 1. In addition, with the proposed progressive
alignment strategy, Model 2 is 0.21 dB and 0.0049 at BT89h
better than Model 1 and 0.22 dB and 0.0044 at BT89v,
and yielding marked improvement confirms that the proposed
progressive alignment strategy can reduce the reconstruction
errors and improve SR performance by considering inaccurate
alignment caused by large sea ice motions.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 6. Qualitative comparison of Region 1 nearby the Baffin Bay with a scaling factor 4. The corresponding method and the quantitative results are displayed
at the top of the image, and the result in bold shows the best performing PSNR and SSIM. Zoomed-in view on the boundary between sea ice and sea water
for best observation at the details. The HR image is the target image. The water vapor product, Sentinel 1 image, HR image, and LR image to be resolved
were acquired on May 15, 2019.
TABLE III
RESULTS OF ABLATION STUDY. PSNR AND SSIM ARE CALCULATED ON TEST DATA S E T.THE RESULT IN
BOLD SHOWS THE BEST PERFORMING PSNR AND SSIM
Model 3 uses a multiscale DConv unit instead of the align-
ment unit used in Model 2. Model 3 outperforms Model 2 by
0.24 dB and 0.0047 at BT89h and 0.28 dB and 0.0046 at BT89v,
showing that the proposed multiscale DConv alignment unit by
adding multiscale offset information can improve the accuracy
of the alignment and the final SR results.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
Fig. 7. Qualitative comparison of Region 2 near the Lincoln Sea with a scaling factor of 4. The corresponding method and the quantitative results are
displayed at the top of the image, and the result in bold shows the best performing PSNR and SSIM. Zoomed-in view on the newly formed polynyas and
smaller scale leads for the best detailed observation. The HR image is the target image. The water vapor product, Sentinel 1 image, HR image, and LR image
to be resolved were acquired on February 16, 2018.
Model 4 replaces the direct fusion method with the adaptive
fusion module using the temporal attention mechanism based
on Model 3. When using the temporal attention mechanism,
Model 4 achieves 0.06 dB and 0.0014 at BT89h performance
gain compared with Model 3 and 0.07 dB and 0.0013 at
BT89v because only beneficial complementary spatiotemporal
information oriented from aligned features is fused.
To visualize the critical processing results of the alignment
module and fusion module in the proposed PMDRnet in more
detail, an AMSR2 image sequence of Region 1, including ve
gridded swath images of a 6.25-km spatial resolution from
May 14, 2019, to May 16, 2019, is considered as an example.
Time-consistent MODIS images for better visual interpretation
of the input AMSR2 images are also shown in the first row
of Fig. 9 as references. The representative features before and
after the alignment module are indicated in the second and
third rows of Fig. 9, which are the 6th and 21st features,
respectively. Assisted by time-consistent MODIS sequence
images, the complex and large sea ice motions with geometric
changes are included in this region. For example, a piece of
floating ice, marked in a red box on image Itin the first
row of Fig. 9, quickly flows downward from image It2to
image It+2and eventually splits into two pieces. In this image
sequence, large motions with geometric changes of sea ice are
included, which is also shown in the features of the AMSR2
image sequence in the second row of Fig. 9. The features of
neighboring images are successfully aligned to the features of
the middle image after the alignment operation. For example,
the red box marks the same position in the image sequence,
where the features are different before alignment and have
good consistency after alignment, as shown in the second and
third rows of Fig. 9. In particular, the fifth image It+2in
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 8. Qualitative comparison of Region 3 near the Beaufort Sea with a scaling factor of 4. The corresponding method and the quantitative results are
displayed at the top of the image, and the result in bold shows the best performing PSNR and SSIM. Zoomed-in view on boundary between sea ice and land
for the best detailed observation. The HR image is the target image. The water vapor product, Sentinel 1 image, HR image, and LR image to be resolved
were acquired on February 18, 2018.
the sequence with large geometric changes and rotation can
achieve the same alignment effect as other neighboring images
with smaller geometric changes and no rotation. To quantita-
tively evaluate the effect of the alignment operation, the optical
flow method, PWCNet [61], is used to calculate the flow
magnitude between the features of the neighboring images and
the reference image before and after the alignment operation.
The frequency distribution of the flow magnitude is shown in
Fig. 10, whose mean bias and STD are shown in Table IV. With
the proposed alignment operation, both the mean and STD of
the flow magnitudes decrease, and all flow magnitudes are
within two pixels after alignment. In particular, the maximum
flow magnitude, which is up to five pixels, is found between
images Itand It+2[Fig. 10(d)]. These offsets decrease to
within two pixels after alignment, and the mean and STD
decrease by 1.5 times and 4 times, respectively. In addition,
the minimum flow magnitude between image Itand It1is
approximately one pixel [Fig. 10(b)] and decreases to within
one pixel after alignment; the mean and STD also decrease
by 0.18 times and 2.4 times, respectively. These comparisons
highlight the effectiveness of the proposed alignment module
for both small and large sea ice motions and are undisturbed
by geometric changes of sea ice.
The temporal attention maps W[t2:t+2]for each image in
the sequence are shown in the fourth row of Fig. 9. When
there is inaccurate alignment, the corresponding weight of the
corresponding neighboring image will be smaller than that of
other images, which can mitigate invalid or erroneous spa-
tiotemporal information fusion. For example, when focusing
on the subregion in the red box, Wthas the greatest weight,
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
Fig. 9. Visual results of selected representative features on the novel alignment module and fusion module. The MODIS images at the top produce better
visual interpretation, whose acquisition time is consistent with the input AMSR2 images, as shown in the top left corner of each MODIS image. Some
representative features extracted from the network are shown, including features before and after the alignment and the temporal attention map. The red box
encloses the selected area covered with a large floating ice, which is quickly floating downward and breaking into two pieces. The time information of MODIS
images is labeled on the first row of images. The images in Row 2–Row 4 are the processing results in PMDRnet at different stages for input LR image
sequence, and their time information is labeled on the second row of images. All times are UTC. The color scale of temporal attention map represents the
weights.
TAB L E I V
MEAN BIAS AND STD OF FLOW MAGNITUDE BEFORE AND AFTER ALIGNMENT BETWEEN THE FOUR NEIGHBORING
SUPPORTING IMAGES {It2,It1,It+1,It+2}AND MIDDLE IMAGE It
and Wt1and Wt+1next to it are larger than Wt2and Wt+2
that is further away from the reference image, resulting from
inaccurate alignment for large sea ice motions which is quite
challenging to perform accurate alignment, particularly for
image It2and image It+2; thus, the SR performance can be
improved by adjusting smaller weight in Wt2and Wt+2.
4) SR Images for Arctic SIC Calculation: To further verify
the applicability of the SR results in terms of Arctic SIC,
we calculate the Arctic SIC using the original and SR AMSR2
images, which are denoted as LR-SIC and SR-SIC, respec-
tively. The HR MODIS-SIC generated by time-consistent
optical MODIS images is used to perform qualitative and
quantitative comparisons of LR-SIC and SR-SIC to show
whether the SR strategy benefits finer Arctic SIC results.
In addition, this process can reverse validate the performance
of the SR model, which is difficult to perform directly due
to the limitation of unavailable higher resolution passive
microwave images as ground truth images.
First, the original AMSR2 image sequence including five
images within two days in Baffin Bay (Region 1 in Fig. 1)
was input into PMDRnet, where the middle image is the target
to be resolved. Then, the SR result of the target image is
generated, whose spatial resolution is increased from 6.25 to
1.5625 km with a 4×upsampling factor. Then, the SR-SIC
and LR-SIC can be derived by the ASI algorithm [12],
which calculates the polarization difference of the brightness
temperatures at 89 GHz and then uses the fixed tie-points for
open water and pure.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
TAB L E V
MEAN ABSOLUTE BIAS,MEAN BIAS,AND STD BETWEEN MODIS-SIC AND LR-SIC AND SR-SIC, RESPECTIVELY,
IN SUBREGIONS A, B, AND CIN FIG.1
Fig. 10. Frequency distributions of flow magnitude before and after
alignment between middle image Itand the four neighboring supporting
images including (a) image It2,(b)image It1, (c) image It+1,and
(d) image It+2.
Moreover, based on the high-quality MODIS reflectance
image, three cloud-free subregions are selected as typical
regions for accuracy evaluation of SR-SIC, where SIC belongs
to different levels of low, middle, and high; these subregions
are denoted as A, B, and C. The positions of the three sub-
regions in the Arctic are shown in Fig. 1, and the cumulative
frequency distribution of SIC oriented from MODIS image in
each subregion is shown in Fig. 11. Subregion A is a medium
SIC area with different sea ice edges and crushed floating
ice, including various sea ice motions in the time period of
the AMSR2 image sequence. Subregion B is a low SIC area
that includes much crushed floating ice and open water; in
particular, there are complex and large sea ice motions with
geometric changes. Subregion C is a high SIC area with
pack ice, including fewer small sea ice motions. A threshold
technique is used to obtain the MODIS-SIC of each subregion,
allowing each MODIS grid cell to be classified as either ice or
water by defining an ice–water reflectance threshold with the
Otsu algorithm [62] in each subregion. The 250-m reflectance
grids are reduced to 6.25-km grids, and then, each of the
6.25-km ice grid cells is counted to calculate the MODIS-SIC
over each registered AMSR2 6.25-km grid cell.
Visual comparison with MODIS images shows that SR-SIC
has more high-frequency detail than LR-SIC in all subregions,
Fig. 11. Cumulative frequency distribution of SIC oriented from MODIS
image in subregions A–C in Fig. 1.
as shown in Fig. 12, providing much sharper sea ice edges and
finer texture details, which is important to achieve fine-scale
sea ice dynamic monitoring and better navigation guidance.
For example, in relatively low-concentration regions such as
subregions A and B of the SR-SIC, referring to MODIS
images, the boundary between sea ice and sea water is clearer,
such as boundary lines marked by solid arrows (BL1and
BL2)in Fig. 12. In particular, the SR-SIC in subregion A can
reproduce crack structures and sizes in the fast ice covering
on the western edge of Greenland (dashed arrow CR), which
are not or are less visible in the LR-SIC. Also, SR-SIC can
restore the shape of floating ices and locate their positions
more accurately, compared to LR-SIC, such as floating ices
indicated by solid circles (FI1,FI
2,andFI
3). The floating
ice FI3is the same as that in the red box of Fig. 9, with
large geometric changes, displacements, and rotations across
image sequence, whose shape and crack in the middle of ice
can be successfully resolved, and the boundary of the sea
water area beneath the floating ice has also been restored
correctly, referring to the MODIS image. Moreover, the details
in ice–water mixing areas can be clearly restored after the SR
process, whose heterogeneity is reduced, as shown by solid
boxes (IW1,IW
2,andIW
3)in Fig. 12.
To perform more detailed quantitative evaluation, SR-SIC
with a 6.25-km spatial resolution was calculated by down-
sampling the original SR-SIC with a scaling factor of 4. The
cumulative frequency distribution of absolute bias between
MODIS-SIC and LR-SIC as well as MODIS-SIC and SR-SIC
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
Fig. 12. MODIS images of subregions A–C in Fig. 1 (first column); corresponding SIC products oriented from LR images and SR images (second and third
columns, respectively). Four types of representative features are marked, including boundary lines between sea ice and sea water marked by solid arrows (BL1
and BL2), a crack in the fast ice region shown by the dashed arrow (CR), floating ices indicated by solid circles (FI1,FI
2,andFI
3), and ice–water mixing
areas marked by the solid boxes (IW1,IW
2,andIW
3). The MODIS images were acquired at 15:30 on May 15, 2019, and AMSR2 images were acquired at
15:29 on May 15, 2019. All times are UTC.
in the three subregions is shown in Fig. 13, whose mean
absolute bias, mean bias, and STD are shown in Table V.
Statistical errors show that the error of the high SIC region
(e.g., subregion C) is small, while that of the relatively low
SIC region (e.g., subregions A and B) is large compared with
MODIS-SIC. These results occur because the coarse spatial
resolution of AMSR2 increases the blurring in areas with
relatively low SIC, including a lot of moving crushed floating
ice, and these results show that the errors can be effectively
reduced by the SR strategy in the three subregions, particularly
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 13. Cumulative frequency distribution of absolute bias of SIC between MODIS-SIC and LR-SIC and SR-SIC, denoted as LR-MODIS and SR-MODIS,
respectively, in (a) subregion A, (b) subregion B, and (c) subregion C in Fig. 1.
in subregion A, which further demonstrates that the errors of
LR-SIC can be reduced.
VI. CONCLUSION
In this article, PMDRnet is developed based on a deep
residual convolutional network to enhance the spatial reso-
lution of passive microwave images of Arctic sea ice from
6.25 to 1.5625 km and to identify finer SIC. The novel progres-
sive alignment strategy and multiscale DConv alignment unit
achieve good alignment performance for complex and large
Arctic sea ice motions even with geometric changes, and the
temporal attention mechanism can compensate for inaccurate
alignment and occlusion problems (e.g., the influence of
atmospheric cloud liquid water and water vapor) to adaptively
fuse the effective spatiotemporal information across sequence.
The sea ice-related loss function can provide more detailed sea
ice information for the network to enhance the SR performance
and improve SR-SIC results. Experimental results demonstrate
that PMDRnet significantly outperforms other state-of-the-
art DL-MISR methods in terms of PSNR and SSIM. The
promising results of SR-SIC with rich high-frequency informa-
tion, such as finer texture features and much sharper sea ice
edges, also exhibit a high quality, including qualitative and
quantitative comparisons, suggesting the good applicability of
the SR results for Arctic SIC calculations. Future research
should focus on adding more frequency images of AMSR2
as network inputs and improve PMDRnet by exploring its
potential in various applications of other sea ice physical
parameters, such as sea ice motion and sea ice thickness.
ACKNOWLEDGMENT
The authors would like to acknowledge Japan Aerospace
Exploration Agency (JAXA, https://global.jaxa.jp/) for the
provision of AMSR2 brightness temperatures. Moderate Reso-
lution Imaging Spectroradiometer (MODIS) data are obtained
from the NASA Level 1 Atmosphere Archive and Dis-
tribution System (LAADS, http://ladsweb.nascom.nasa.gov/).
The water vapor products are provided by the University
of Bremen, Bremen, Germany (https://seaice.uni-bremen.de/).
Sentinel-1 images are obtained from https://www.sentinel-
hub.com/. Tiantian Feng and Rongxing Li developed the con-
cept and methodology and supervised research. Xiaomin Liu
contributed to methodology development and data analysis.
Xiaofan Shen provided support in data preparation. All authors
wrote, reviewed, and edited the final manuscript.
REFERENCES
[1] J. Haarpaintner and G. Spreen, “Use of enhanced-resolution
QuikSCAT/SeaWinds data for operational ice services and climate
research: Sea ice edge, type, concentration, and drift,” IEEE Trans.
Geosci. Remote Sens., vol. 45, no. 10, pp. 3131–3137, Oct. 2007, doi:
10.1109/TGRS.2007.895419.
[2] M. Huntemann, G. Heygster, L. Kaleschke, T. Krumpen, M. Mäkynen,
and M. Drusch, “Empirical sea ice thickness retrieval during the freeze-
up period from SMOS high incident angle observations, Cryosphere,
vol. 8, no. 2, pp. 439–451, 2014, doi: 10.5194/tc-8-439-2014.
[3] R. Ricker et al., “Satellite-observed drop of Arctic sea ice growth in
winter 2015–2016,” Geophys. Res. Lett., vol. 44, no. 7, pp. 3236–3245,
Apr. 2017, doi: 10.1002/2016GL072244.
[4] M. C. Serreze and J. Stroeve, “Arctic sea ice trends, variability and
implications for seasonal ice forecasting,” Philos. Trans. Roy. Soc. A,
Math.,Phys.Eng.Sci., vol. 373, no. 2045, Jul. 2015, Art. no. 20140159,
doi: 10.1098/rsta.2014.0159.
[5] V. Ludwig, G. Spreen, C. Haas, L. Istomina, F. Kauker, and
D. Murashkin, “The 2018 north Greenland Polynya observed by a newly
introduced merged optical and passive microwave sea-ice concentration
dataset,” Cryosphere, vol. 13, no. 7, pp. 2051–2073, Jul. 2019, doi:
10.5194/tc-13-2051-2019.
[6] J. Karvonen, “Baltic sea ice concentration estimation using SENTINEL-
1 SAR and AMSR2 microwave radiometer data,” IEEE Trans.
Geosci. Remote Sens., vol. 55, no. 5, pp. 2871–2883, May 2017, doi:
10.1109/TGRS.2017.2655567.
[7] H. Han and H.-C. Kim, “Evaluation of summer passive microwave sea
ice concentrations in the Chukchi Sea based on KOMPSAT-5 SAR and
numerical weather prediction data,” Remote Sens. Environ., vol. 209,
pp. 343–362, May 2018, doi: 10.1016/j.rse.2018.02.058.
[8] D. Baldwin, M. Tschudi, F. Pacifici, and Y. Liu, “Validation of suomi-
NPP VIIRS sea ice concentration with very high-resolution satellite
and airborne camera imagery,” ISPRS J. Photogramm. Remote Sens.,
vol. 130, pp. 122–138, Aug. 2017, doi: 10.1016/j.isprsjprs.2017.05.018.
[9] H. Wiebe, G. Heygster, and T. Markus, “Comparison of the ASI ice
concentration algorithm with landsat-7 ETM+ and SAR imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 47, no. 9, pp. 3008–3015, Sep. 2009,
doi: 10.1109/TGRS.2009.2026367.
[10] J. C. Comiso, D. J. Cavalieri, C. L. Parkinson, and P. Gloersen,
“Passive microwave algorithms for sea ice concentration: A comparison
of two techniques,” Remote Sens. Environ., vol. 60, no. 3, pp. 357–384,
Jun. 1997, doi: 10.1016/S0034-4257(96)00220-9.
[11] T. Markus and D. J. Cavalieri, “An enhancement of the NASA team
sea ice algorithm,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 3,
pp. 1387–1398, May 2000, doi: 10.1109/36.843033.
[12] G. Spreen, L. Kaleschke, and G. Heygster, “Sea ice remote sensing
using AMSR-E 89-GHz channels,” J. Geophys. Res., Oceans, vol. 113,
no. C2, pp. 1–14, Jan. 2008, doi: 10.1029/2005JC003384.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PMDRnet FOR MISR OF AMSR2 ARCTIC SEA ICE IMAGES 4304118
[13] Y. Xian, Z. I. Petrou, Y. Tian, and W. N. Meier, “Super-
resolved fine-scale sea ice motion tracking,” IEEE Trans. Geosci.
Remote Sens., vol. 55, no. 10, pp. 5427–5439, Oct. 2017, doi:
10.1109/TGRS.2017.2699081.
[14] Z. I. Petrou, Y. Xian, and Y. Tian, “Towards breaking the spatial
resolution barriers: An optical flow and super-resolution approach for sea
ice motion estimation,” ISPRS J. Photogramm., vol. 138, pp. 164–175,
Apr. 2018, doi: 10.1016/j.isprsjprs.2018.01.020.
[15] C. Lüpkes, T. Vihma, G. Birnbaum, and U. Wacker,
“Influence of leads in sea ice on the temperature of the
atmospheric boundary layer during polar night,” Geophys. Res.
Lett., vol. 35, no. 3, pp. 3805-1–3805-5, Feb. 2008, doi:
http://dx.doi.org/10.1029/2007GL03246110.1029/2007GL032461.
[16] A. Gambardella and M. Migliaccio, “On the superresolution
of microwave scanning radiometer measurements,” IEEE Geosci.
Remote Sens. Lett., vol. 5, no. 4, pp. 796–800, Oct. 2008, doi:
10.1109/LGRS.2008.2006285.
[17] T. Hu, F. Zhang, W. Li, W. Hu, and R. Tao, “Microwave radiometer data
superresolution using image degradation and residual network,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8954–8967, Nov. 2019,
doi: 10.1109/TGRS.2019.2923886.
[18] G. E. Backus and J. F. Gilbert, “Numerical applications of a for-
malism for geophysical inverse problems,” Geophys. J. Roy. Astron.
Soc., vol. 13, nos. 1–3, pp. 247–276, Jul. 1967, doi: 10.1111/j.1365-
246X.1967.tb02159.x.
[19] G. Backus and F. Gilbert, “The resolving power of gross earth
data,” Geophys. J. Int., vol. 16, no. 2, pp. 169–205 , Oct. 1968, doi:
10.1111/j.1365-246X.1968.tb00216.x.
[20] D. G. Long and D. L. Daum, “Spatial resolution enhancement of SSM/I
data,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 2, pp. 407–417,
Mar. 1998, doi: 10.1109/36.662726.
[21] D. G. Long and M. J. Brodzik, “Optimum image formation for
spaceborne microwave radiometer products,” IEEE Trans. Geosci.
Remote Sens., vol. 54, no. 5, pp. 2763–2779, May 2016, doi:
10.1109/TGRS.2015.2505677.
[22] P. Chakraborty, A. Misra, T. Misra, and S. S. Rana, “Bright-
ness temperature reconstruction using BGI,” IEEE Trans. Geosci.
Remote Sens., vol. 46, no. 6, pp. 1768–1773, Jun. 2008, doi:
10.1109/TGRS.2008.916082.
[23] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive
review and list of resources,” IEEE Geosci. Remote Sens. Mag.,vol.5,
no. 4, pp. 8–36, Dec. 2017, doi: 10.1109/MGRS.2017.2762307.
[24] A. Bordone Molini, D. Valsesia, G. Fracastoro, and E. Magli, “Deep-
SUM: Deep neural network for super-resolution of unregistered multi-
temporal images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5,
pp. 3644–3656, May 2020, doi: 10.1109/TGRS.2019.2959248.
[25] H. Liu et al., “Video super resolution based on deep learning:
A comprehensive survey, 2020, arXiv:2007.12928.
[26] A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, “Video
super-resolution with convolutional neural networks,” IEEE Trans.
Comput. Imaging, vol. 2, no. 2, pp. 109–122, Jun. 2016, doi:
10.1109/TCI.2016.2532323.
[27] J. Caballero et al., “Real-time video super-resolution with spatio-
temporal networks and motion compensation,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4778–4787.
[28] M. S. M. Sajjadi, R. Vemulapalli, and M. Brown, “Frame-recurrent
video super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., Jun. 2018, pp. 6626–6634.
[29] M. Chu, Y. Xie, J. Mayer, L. Leal-Taixé, and N. Thuerey, “Learning
temporal coherence via self-supervision for GAN-based video genera-
tion,” ACM Trans. Graph., vol. 39, no. 4, pp. 71–75, Aug. 2020, doi:
10.1145/3386569.3392457.
[30] M. Haris, G. Shakhnarovich, and N. Ukita, “Recurrent back-projection
network for video super-resolution,” in Proc. IEEE/CVF Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3897–3906.
[31] L. Wang, Y. Guo, L. Liu, Z. Lin, X. Deng, and W. An, “Deep
video super-resolution using HR optical flow estimation,” IEEE
Trans. Image Process., vol. 29, pp. 4323–4336, 2020, doi: 10.1109/
TIP.2020.2967596.
[32] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution
network using dynamic upsampling filters without explicit motion com-
pensation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Jun. 2018, pp. 3224–3232.
[33] Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “TDAN: Temporally-
deformable alignment network for video super-resolution,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 3360–3369.
[34] X. Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video
restoration with enhanced deformable convolutional networks,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
Jun. 2019, pp. 1954–1963.
[35] Y. Huang, W. Wang, and L. Wang, “Bidirectional recurrent convolu-
tional networks for multi-frame super-resolution, in Proc. NIPS, 2015,
pp. 235–243.
[36] X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep
video super-resolution,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Oct. 2017, pp. 4472–4480.
[37] T. Isobe et al., “Video super-resolution with recurrent structure-detail
network,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Aug. 2020,
pp. 645–660.
[38] J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int.
Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 764–773.
[39] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets v2: More
deformable, better results,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2019, pp. 9308–9316.
[40] D. Liu et al., “Robust video super-resolution with learned temporal
dynamics,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
pp. 2507–2515.
[41] T. Isobe et al., “Video super-resolution with temporal group attention,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2020, pp. 8008–8017.
[42] M. Deudon et al., “HighRes-net: Recursive fusion for multi-frame super-
resolution of satellite imagery, 2020, arXiv:2002.06460.
[43] F. Salvetti, V. Mazzia, A. Khaliq, and M. Chiaberge, “Multi-image
super resolution of remotely sensed images using residual attention deep
neural networks,” Remote Sens., vol. 12, no. 14, p. 2207, Jul. 2020, doi:
10.3390/rs12142207.
[44] Y. Xiao, X. Su, Q. Yuan, D. Liu, H. Shen, and L. Zhang, “Satellite
video super-resolution via multiscale deformable convolution alignment
and temporal grouping projection,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–19, 2022, doi: 10.1109/TGRS.2021.3107352.
[45] JAXA. (2013). GCOM-W1 ‘SHIZUKU’ Data Users Handbook. [Online].
Available: https://gcom-w1.jaxa.jp/
[46] A. Beitsch, L. Kaleschke, and S. Kern, “Investigating high-resolution
AMSR2 sea ice concentrations during the February 2013 fracture event
in the Beaufort Sea,” Remote Sens., vol. 6, no. 5, pp. 3841–3856, 2014,
doi: 10.3390/rs6053841.
[47] NSIDC. (2016). Documentation: Polar Stereographic Projection
and Grid. [Online]. Available: http://nsidc.org/data/polar-stereo/ps_
grids.html
[48] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
using very deep convolutional networks,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1646–1654.
[49] K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “Understanding
deformable alignment in video super-resolution,” in Proc. AAAI Conf.
Artif. Intell., Feb. 2021, pp. 973–981.
[50] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous
convolution for semantic image segmentation,” 2017, arXiv:1706.05587.
[51] M. M. Lau and K. H. Lim, “Investigation of activation functions in deep
belief network,” in Proc. 2nd Int. Conf. Control Robot. Eng. (ICCRE),
Apr. 2017, pp. 201–206.
[52] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 770–778.
[53] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep
residual networks for single image super-resolution,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017,
pp. 136–144.
[54] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
Boltzmann machines,” in Proc. Int. Conf. Mach. Learn. (ICML),
Jun. 2010, pp. 807–814.
[55] W. Shi et al., “Real-time single image and video super-resolution using
an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1874–1883.
[56] P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud, “Two
deterministic half-quadratic regularization algorithms for computed
imaging,” in Proc. 1st Int. Conf. Image Process., 1994, pp. 168–172.
[57] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian
pyramid networks for fast and accurate super-resolution,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 624–632.
[58] J. Li et al., “Hyperspectral image super-resolution by band
attention through adversarial learning,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 6, pp. 4304–4318, Jun. 2020, doi:
10.1109/TGRS.2019.2962713.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
4304118 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
[59] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian
optimization of machine learning algorithms,” in Proc. NIPS, 2012,
pp. 2960–2968.
[60] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger, “Information-
theoretic regret bounds for Gaussian process optimization in the bandit
setting,” IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 3250–3265,
May 2012, doi: 10.1109/TIT.2011.2182033.
[61] D. Sun, X. Yang, M. Liu, and J. Kautz, “PWC-Net: CNNs for optical
flow using pyramid, warping, and cost volume,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 8934–8943.
[62] P. K. Sahoo and G. Arora, A thresholding method based on two-
dimensional Renyi’s entropy, Pattern Recognit., vol. 37, no. 6,
pp. 1149–1161, 2004, doi: 10.1016/j.patcog.2003.10.008.
Xiaomin Liu received the B.S. degree in surveying
engineering from Tongji University, Shanghai,
China, in 2018, where she is currently pursu-
ing the Ph.D. degree with the College of Sur-
veying and Geo-Informatics, with a focus on
Arctic sea ice monitoring using multisource satellite
sensors.
She is currently involved in the research with the
Center for Spatial Information Science and Sustain-
able Development Applications, Tongji University.
Her research interests include passive microwave
image processing, deep learning-based super-resolution technology, and Arctic
sea ice monitoring using multisource satellite sensors.
Tiantian Feng received the B.S. degree and the
Ph.D. degree in photogrammetry and remote sensing
from Wuhan University, Wuhan, China, in 2004 and
2010, respectively.
She was a Visiting Ph.D. Student with the
Department of Geography, The State University
of New York at Buffalo, Buffalo, NY, USA,
from 2007 to 2008. She is currently an Associate
Professor with the College of Survey and Geoin-
formatics, Tongji University, Shanghai, China. Her
research interests include multispectral remote sens-
ing image processing, pattern recognition, and remote sensing applications in
polar research.
Xiaofan Shen received the B.S. degree in survey-
ing engineering from Tongji University, Shanghai,
China, in 2019, where she is currently pursu-
ing academic master’s degree with the College
of Surveying and Geoinformatics, with a focus
on super-resolution-based deep learning for passive
microwave Arctic sea ice images.
She is involved in the research with the
Center for Spatial Information Science and Sus-
tainable Development Applications, Tongji Univer-
sity. Her research interests include super-resolution
techniques-based deep learning, image enhancement of remote sensing
images, and Arctic sea ice monitoring using multisource satellite sensors.
Rongxing Li (Senior Member, IEEE) received the
B.S. and M.S. (Hons.) degrees in surveying and
mapping from Tongji University, Shanghai, China,
in 1982 and 1984, respectively, and the Ph.D.
degree in photogrammetry and remote sensing
from the Technical University of Berlin, Berlin,
Germany, in 1990.
He was an Associate Professor with the
University of Calgary, Calgary, AB, Canada,
from 1992 to 1996. He was the Lowber B. Strange
Professor of Engineering and the Director of
the Mapping and GIS Laboratory, Department of Civil, Environmental
and Geodetic Engineering, The Ohio State University, Columbus, OH,
USA, from 1996 to 2014. Since 2014, he has been a Professor and the
Director of the Center for Spatial Information Science and Sustainable
Development Applications, Tongji University. His research interests
include photogrammetry, digital mapping, polar remote sensing, planetary
exploration, and coastal and marine geographic information systems.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on June 27,2022 at 07:40:03 UTC from IEEE Xplore. Restrictions apply.
... Datasets from AMSR2, RADARSAT2, and Sentinel-1 are frequently leveraged for SIC estimation via deep learning algorithms, given their unique capabilities and strengths. AMSR2 offers brightness temperature data, which is highly relevant for analyzing ice concentration [55][56][57][58]. As already mentioned, the advantage of AMSR2 lies in its cloud penetration capacity, which ensures continuous data acquisition. ...
... Another segmentation model, DeepLab, which uses dilated convolutions to enable multi-scale contextual information aggregation without resolution loss, also shows good performance for the SIC estimation task [56,58]. Super-resolution techniques applied to satellite data enhance SIC classification [56,57] by improving the spatial resolution of data, allowing for the capture of finer ice structure details, thereby potentially improving the SIC classification accuracy. ...
... Data used in sea ice motion estimation comes from a variety of remote sensing platforms. For instance, passive microwave sensors such as AMSR-E and AMSR2 are useful in sea ice motion estimation as they can detect ice concentration and type, and are unaffected by darkness or cloud cover, enabling continuous monitoring [57,[86][87][88]. However, these sensors' relatively coarser spatial resolution poses a challenge for detailed local studies. ...
Article
Full-text available
Revolutionary advances in artificial intelligence (AI) in the past decade have brought transformative innovation across science and engineering disciplines. In the field of Arctic science, we have witnessed an increasing trend in the adoption of AI, especially deep learning, to support the analysis of Arctic big data and facilitate new discoveries. In this paper, we provide a comprehensive review of the applications of deep learning in sea ice remote sensing domains, focusing on problems such as sea ice lead detection, thickness estimation, sea ice concentration and extent forecasting, motion detection, and sea ice type classification. In addition to discussing these applications, we also summarize technological advances that provide customized deep learning solutions, including new loss functions and learning strategies to better understand sea ice dynamics. To promote the growth of this exciting interdisciplinary field, we further explore several research areas where the Arctic sea ice community can benefit from cutting-edge AI technology. These areas include improving multimodal deep learning capabilities, enhancing model accuracy in measuring prediction uncertainty, better leveraging AI foundation models, and deepening integration with physics-based models. We hope that this paper can serve as a cornerstone in the progress of Arctic sea ice research using AI and inspire further advances in this field.
... Application of dedicated multi-scale procedures in either convolution or generator blocks Improved capturing of different abstraction levels in input images Fernandez et al., 2021a;Alboody et al., 2022;Liu et al., 2022c;Diao et al., 2022;Li et al., 2022) Attention mechanism ...
... Inclusion of attention mechanism Improved feature identification -however, Liu, et al. (Liu et al., 2021) found that self-attention mechanisms may be counterproductive (Fernandez et al., 2021a;Zhang et al., 2021;Li et al., 2022;Zhu et al., 2022) Multi-band fusion Multi-band fusion mechanism Improved feature isolation (Zhang et al., 2021;Alboody et al., 2022;Liu et al., 2022b;Liu et al., 2022c;Lei et al., 2022;Zhang et al., 2022;Tarasiewicz et al., 2023) Parallelization Parallelization of kernels or branches Improved capturing of features on different levels in the input image (Li et al., 2020;Liu et al., 2021;Lei et al., 2022;Li et al., 2022;Zhu et al., 2022) Edge detection Utilization of an edge-detection focused loss function ...
... al. (2018),Ulfarsson et al. (2019),Li et al. (2020),Pineda et al. (2020),Romero et al. (2020),Luo et al. (2021),Liu et al. (2022b),Liu et al. (2022c),Latif et al. (2022),Li et al. (2022),Liu and Chen (2022),Zhu et al. (2022),Tarasiewicz et al. (2023) SAMSpectral Angle Mapper Yuhas et al. (1992), Pushparaj and Hegde (2017) Muckenhuber et al. (2016), Ulfarsson et al. (2019), Romero et al. (2020), Fernandez et al. (2021a), Fernandez et al. (2021b), Liu et al. (2021), Luo et al. (2021), Zhang et al. (2021), Alboody et al. (2022), Diao et al. (2022), Latif et al. (2022), Lei et al. (2022), Li et al. (2022), Zhu et al. al. (2020); Romero et al. (2020); Liu et al. (2021); Zhang et al. (2021); Liu et al. (2022b); Lei et al. (2022); Li et al. (2022); Zhu et al. (2022) Q4 "Q4" Alparone et al. (2004); Marcello et al. (2013) Muckenhuber et al. (2016); Liu et al. (2021); Zhang et al. (2021); Diao et al. (2022); Lei et al. (2022) RMSE Root Mean Squared Error Sara et al. (2019) Zhang et al. (2018); Ulfarsson et al. (2019); Li et al. (2020); Zhang et al. (2020); Liu et al. (2022b); Diao et al. (2022); Latif et al. (2022); Li et al. (2022); Zhu et al. (2022) UIQI Universal Image Quality Index Zhou and Bovik (2002); Pushparaj and Hegde (2017) Muckenhuber et al. (2016); Liu et al. (2022b); Diao et al. (2022); Latif et al. (2022) PSNR Peak Signal-to-Noise Ratio Hore and Ziou (2010) Zhang et al. (2018); Pineda et al. (2020); Romero et al. (2020); Fernandez et al. (2021a); Fernandez et al. (2021b); Alboody et al. (2022); Liu et al. (2022c); Li et al. (2022); Liu and Chen (2022); Tarasiewicz et al. (2023) SRE Signal-to-Reconstruction Error Lanaras et al. (2018) Ulfarsson et al. (2019); Luo et al. (2021); Latif et al. (2022) ...
Article
Full-text available
Introduction: This research explores the application of generative artificial intelligence, specifically the novel ARISGAN framework, for generating high-resolution synthetic satellite imagery in the challenging arctic environment. Realistic and high-resolution surface imagery in the Arctic is crucial for applications ranging from satellite retrieval systems to the wellbeing and safety of Inuit populations relying on detailed surface observations. Methods: The ARISGAN framework was designed by combining dense block, multireceptive field, and Pix2Pix architecture. This innovative combination aims to address the need for high-quality imagery and improve upon existing state-of-the-art models. Various tasks and metrics were employed to evaluate the performance of ARISGAN, with particular attention to land-based and sea ice-based imagery. Results: The results demonstrate that the ARISGAN framework surpasses existing state-of-the-art models across diverse tasks and metrics. Specifically, land-based imagery super-resolution exhibits superior metrics compared to sea ice-based imagery when evaluated across multiple models. These findings confirm the ARISGAN framework’s effectiveness in generating perceptually valid high-resolution arctic surface imagery. Discussion: This study contributes to the advancement of Earth Observation in polar regions by introducing a framework that combines advanced image processing techniques with a well-designed architecture. The ARISGAN framework’s ability to outperform existing models underscores its potential. Identified limitations include challenges in temporal synchronicity, multi-spectral image analysis, preprocessing, and quality metrics. The discussion also highlights potential avenues for future research, encouraging further refinement of the ARISGAN framework to enhance the quality and availability of high-resolution satellite imagery in the Arctic.
... SR is a process of reconstructing a high-resolution image (HR) from the valid information of a low-resolution image (LR) using a computer algorithm. It has a very wide range of applications in many fields [12], such as medical imaging [13][14][15], surveillance and security [16,17], and satellite detection [18][19][20]. Earlier, many classical methods have been proposed to solve such problems, such as the sample learning method [21], neighborhood embedding method [22,23], sparse representation method [24,25], etc. In recent years, image SR reconstruction based on deep learning has been widely researched and reached the state-of-the-art in this field. ...
Article
Full-text available
Compared with traditional contact precision measurement, vision-based non-contact precision measurement has the features of low cost and flexible multi-point information extraction, but how to ensure the measurement accuracy of vision-based non-contact precision measurement is an urgent problem. Traditional thinking often focuses on hardware upgrades to improve image resolution, but this brings high costs and is limited by the physical characteristics of the hardware itself. In this paper, we start from the software aspect to improve the image resolution by using the super-resolution reconstruction algorithm and propose an image super-resolution reconstruction algorithm—Swin Transformer with a Vast-receptive-field Pixel Attention, which combines the vast-receptive-field pixel attention mechanism with the Swin Transformer self-attention mechanism, focuses on the learning of the high-frequency information features of the image. Experiments are conducted both in public datasets and real measurement images. Extensive experimental validation shows that the model can obtain more edge and high-frequency detail features in public datasets, and the objective evaluation index on Set5, Set14, B100, Urban100, and Manga109 datasets is improved by 0.06 dB on average compared with the existing algorithms. In actual measurements, the algorithm in this paper for USAF1951 resolution tablet, image super-resolution reconstruction image in the horizontal and vertical direction of the measurement accuracy increased by an average of 6.97%, the horizontal and vertical direction of the relative measurement accuracy of an average of 30.20% improvement. This study provides a potential development direction for vision-based non-contact precision measurement.
... Recently, many deep learning methods have been proposed to improve the performance of Arctic sea ice image super-resolution, such as the encoder-decoder network [3], PMDRnet [4], CNN-based method [5] and OGSRN [6]. All of these methods have achieved good performance on Arctic sea ice data super-resolution, but these method can hardly integrate the multi-scale information and ignore the integration of the spatial and channel features. ...
Preprint
Full-text available
Arctic Sea Ice Concentration (SIC) is the ratio of ice-covered area to the total sea area of the Arctic Ocean, which is a key indicator for maritime activities. Nowadays, we often use passive microwave images to display SIC, but it has low spatial resolution, and most of the existing super-resolution methods of Arctic SIC don't take the integration of spatial and channel features into account and can't effectively integrate the multi-scale feature. To overcome the aforementioned issues, we propose MFM-Net for Arctic SIC super-resolution, which concurrently aggregates multi-scale information while integrating spatial and channel features. Extensive experiments on Arctic SIC dataset from the AMSR-E/AMSR-2 SIC DT-ASI products from Ocean University of China validate the effectiveness of porposed MFM-Net.
... In recent years, research efforts dedicated to devising innovative algorithms to achieve this goal has shown a notable increase. Some of these include using progressive multiscale deformable residual networks for multi-image super-resolution processing (Liu, Feng, et al. 2022) and utilizing end-to-end deep neural networks for the multi-image super-resolution task (Arefin et al. 2020). These studies are instrumental in advancing multi-image super-resolution technology. ...
Article
Full-text available
Due to the generally low spatial resolution of hyperspectral images (HSIs), early multispectral images lacked corresponding panchromatic bands, and as a result, fusion methods could not be used to enhance resolution. Many researchers have proposed various image super-resolution methods to address these limitations. However, these methods still suffered from issues, such as inadequate feature representation, lack of spectral feature representation, and high computational cost and inefficiency. To address these challenges, a spectral and spatial transformer (SST) algorithm for hyperspectral remote sensing image super-resolution is introduced. This algorithm uses a spatial transformer structure to extract the spatial features between the image pixels and a spectral transformer structure to extract the spectral features within the image pixels. The integration of these two components is applied to HSI super-resolution. After comparative experiments with currently advanced methods on three publicly available hyperspectral datasets, the results consistently show that our algorithm has better performance in both spectral fidelity and spatial restoration. Furthermore, our proposed algorithm was applied to real-world super-resolution experiments in the region of China's Ruoergai National Park, and subsequently, pixel-based classification was conducted on the super-resolution images, the results indicate that our algorithm could also be applied to future remote sensing interpretation tasks.