Conference PaperPDF Available

Restoration of Sea Surface Temperature Satellite Images Using a Partially Occluded Training Set

Authors:
Restoration of Sea Surface Temperature Satellite
Images Using a Partially Occluded Training Set
Satoki Shibata
Graduate School of Informatics
Kyoto University
Masaaki Iiyama
ACCMS
Kyoto University
Atsushi Hashimoto
Graduate School of Education
Kyoto University
Michihiko Minoh
ACCMS
Kyoto University
Abstract—Sea surface temperature(SST) satellite images are
often partially occluded by clouds. Image inpainting is one
approach to restore the occluded region. Considering the sparse-
ness of SST images, they can be restored via learning-based
inpainting. However, state-of-the-art learning-based inpainting
methods using deep neural networks require large amount of
non-occluded images as a training set. Since most SST images
contain occluded regions, it is hard to collect sufficient non-
occluded images. In this paper, we propose a novel method that
uses occluded images as training images hence we can enlarge the
amount of available training images from a certain SST image
set. This is realized by comprising a novel reconstruction loss and
adversarial loss. Experimental results confirm the effectiveness
of our method.
I. INTRODUCTION
Sea surface temperature(SST) sensing is an essential for
weather forecasting and ocean-related industries such as fish-
eries and marine transportation. Meteorological satellites use
infrared radiation to track SST in short time and wide areas,
and acquire sea surface temperature satellite images. However,
as shown in Fig.1(a), this technique cannot measure temper-
atures in the regions occluded by cloud, which may prevent
the applications from using the SST data. In particular, the
fisheries need real-time and high resolution SST images which
do not occulded at all.
Two approaches have been proposed to address this issue.
One approach uses data assimilation based on physical model
from oceanographic data[1], and interpolates the data for the
occluded regions. However its significant computational time
makes it unsuitable to real-time applications, and also its
computational complexity makes it preclude the deriving of
high resolution SST images.
The other approach uses microwave sensors[2], which are
comparatively less sensitive to occluded region than infrared
sensors. However, even microwave sensors are unable to gather
data in rainy conditions, and only provide low spatial resolu-
tion data. Approaches that combine infrared and microwave
sensors[3] also suffer from low spatial resolution.
By introducing learning-based inpainting, we may restore
high resolution SST images in real-time because the charac-
teristics of SST images are sparse[4][5] though SST is affected
by many factors of a physical phenomenon.
In the restoration of natural images, by applying the
state-of-the-art methods, occluded images can be restored
with high accuracy without incurring excessive computational
complexity[6][7]. In [8][9][10], the mean squared error(MSE)
between the restored images and the ground truth images is
minimized. The use of MSE minimization can restore partially
occluded images with high accuracy, but the restored image
is often over smoothed. Some applications, such as fishery
catch estimation, require clear SST images(not over smoothed
images). Therefore, in the restoration of SST images, the
restored images should not only exhibit low MSE but also
be clear SST images.
Some inpainting methods for natural image were proposed
to overcome these drawbacks using Generative Adversarial
Networks(GAN). GAN is essentially an image generation
model[11][19][20] comprising two separate components; a
generator and a discriminator. In GAN, the generator attempts
to generate data which can fool the discriminator. The dis-
criminator in turn attempts to distinguish the real data in the
training dataset from the fake data created by the generator.
A min-max game is then played between the generator and
the discriminator. If this works stably, the generator will learn
the distribution of the training dataset and will be capable of
generating data that are difficult to be distinguished from the
real data. The loss of GAN is called adversarial loss. Since
adversarial loss can ensure that the restored images remain
within the distribution of the training images, the methods
using GAN are able to restore images photo-realistically[6].
These methods cannot be be applied to SST images without
modification. Most learning-based inpainting methods using
deep neural networks require large amount of non-occluded
training images. However, we cannot gather enough amount
of non-occluded SST images due to the following two reasons.
One is that SST images have largely occluded regions by
clouds. The other is that a separate model must be created for
each area because the behavior of ocean currents causes SST
to significantly differ from area to area. On the other hand, it is
relatively easy to acquire partially occluded SST images. We
therefore propose a method in which such images are used
as the training images. Our method can greatly enlarge the
amount of available training images. To make this possible,
we propose novel losses, modifying the reconstruction loss
and the adversarial loss.
Our modified losses are calculated only from the non-
occluded region of the training images.
The adversarial loss are modified to prevent the discrim-
inator from distinguishing simply due to the presence of
(a) Input Data (b) Restoration
Fig. 1: Restoration of an entire satellite image by using our method. White pixels indicate occluded region by clouds, and black pixels
indicate land.
occluded region. A technical challenge when training the
discriminator is that the training images have occluded region
whereas the restored images should not have occluded region.
Hence, the simple presence of an occluded region allows the
discriminator to distinguish between a real training image and
a restored image produced by the generator. To address this,
we deliberately introduce areas of occluded region into the
restored images before inputting them to the discriminator.
By using the two novel losses, our system trains both a
generator and a discriminator. The main contributions of our
study are as follows;
This is the first study to apply deep neural network based
inpainting for the restoration of SST images, which can
restore SST occluded by clouds.
We propose a novel reconstruction loss and adversarial
loss that can handle partially occluded training images,
hence our method can enlarge the amount of available
training images.
II. RE LATE D WOR K
Restoration of partially occluded SST images can be con-
sidered as non-blind image inpainting. It needs to know the
location of occluded region in advance. In our case, conven-
tional cloud detecting techniques [12] [13] could be used to
detect these occluded regions. Hence, in the following, we
only discuss non-blind image inpainting.
There are two types of approaches in image inpainting; non-
learning-based and learning-based inpainting.
The former restores images using only the clues available
in the image itself, such as pixel value of the nearest non-
occluded region[14][15] or texture patterns in non-occluded
region[16].
The latter restores images using external training im-
ages. The simplest methods of this approach use a patch
dictionary[17][18]. All state-of-the-art learning-based in-
painting methods employ deep neural networks to restore
images[6][7].
Since SST images are sparse[4][5], learning-based inpaint-
ing with deep neural network is employed in our method.
A. Inpainting Using Reconstruction Loss[8][9][10]
Inpainting methods using reconstruction loss operate by
minimizing the MSE between the restored images and ground
truth images. The loss function of such methods is given by
Eq.(1).
Lrec =1
N
N
n=1
||xnG(ˆxn)||2(1)
where Nis the number of training images, xnis a ground truth
image from training images. ˆxnis the occluded counterpart
of xn, and Gis the function of the generator that restores the
occluded image.
If the restoration is done only using the minimization of
Eq.(1), the resulting image will be over smoothed.
B. Inpainting Using Adversarial Loss[6][7]
To solve the problem of over smoothness, some methods
also apply generative adversarial networks(GAN)[6][7] .
GAN has been applied to a range of image generation
tasks such as image translation[21][22], and image super-
resolution[23]. GAN has also been used effectively in image
inpainting[6][7].
One of the state-of-the-art methods of image inpainting[6]
combines reconstruction loss and adversarial loss which helps
to prevent over smoothed restoration. Adversarial loss is a loss
of GAN, which is given by Eq.(2),
Ladv =1
N
N
n=1
(log(1 D(G(ˆxn))) + log D(xn)) (2)
where Dis the function of the discriminator. The training
of the generator and the discriminator uses the following
optimization function.
min
Gmax
D(αLrec + (1 α)Ladv )(3)
where αis a weight parameter(0α1).
III. OUR METHOD
Fig.2 provides overview of our method. It also comprises
two components; the generator and the discriminator.
The prevalence of occluded regions in SST images makes
it difficult to gather sufficient non-occluded training images.
Therefore we need to enlarge the amount of available training
images by using partially occluded images.
Fig. 2: Overview of our method. It comprises two netwroks; the
generator and the discriminator. Partially occluded images can
be used as training images.
A. Our Reconstruction Loss
Our reconstruction loss for the generator is calculated as
shown in Fig.3. As the training images are partially occluded,
the ground truth image xnin Eq.(1) includes an occluded
region. The reconstruction loss given by Eq.(1) cannot be
calculated for the occluded region of xn.
In our method, only the reconstruction loss in the non-
occluded region of xnis calculated. Eq.(1) is therefore rewrit-
ten as follows;
Lours
rec =1
N
N
n=1
||(xnG(ˆxn)) mxn||2(4)
m(i)
xn={0 (x(i)
nis occluded)
1 (x(i)
nis not occluded) (5)
ˆxn=xnmrand (6)
The size of mxnis same as that of xn.mxnis the binary
occlusion mask of xnwhich corresponds to the occluded
region of xn.0valued pixels and 1valued pixels in mxn
correspond to occluded region and non-occluded region, re-
spectively(Eq.(5)). In Eq.(5), x(i)
nand m(i)
xnindicate i-th pixel
of xnand mxn, respectively. A randomly chosen occlusion
binary mask mrand of which size is same as that of xnis
applied to the ground truth images for training(Eq.(6)).
Fig. 3: Our reconstruction loss. Only the reconstruction loss in the
non-occluded region of the ground truth image is calculated.
B. Our Adversarial Loss
We also employ the adversarial loss(Eq.(2)) following
Pathak et al.[6]. Partially occluded training images are taken
into consideration in the case of SST images, whereas the
images produced by the generator, G(ˆxn), does not include
occluded regions. If the adversarial loss is applied, the dis-
criminator can distinguish between a ground truth image xn
and a restored image G(ˆxn)simply by the presence of the
occluded region.
To address this problem, we apply the occlusion mask mxn
to the restored image G(ˆxn). This means that the image input
to the discriminator would always include an occluded region,
irrespective of its origin.
Our adversarial loss is as follows;
Lours
adv =1
N
N
n=1
(log(1D(G(ˆ
xn)mxn))+log D(xn)) (7)
1) Fake Occlusion: There is another problem about the
generator. The generator sometimes generates occluded re-
gions in a restored image. The occluded regions generated
are defined as fake occlusion. If we simply input a restored
image G(ˆxn)mxnto the discriminator, the discriminator
does not care of location of occluded regions. Therefore, the
discriminator cannot distinguish between real occlusion mxn
in Eq.(7) and fake occlusion. This allows the generator to
generate fake occlusion in a restored image.
To solve this problem, the input of the discriminator com-
prises a two channel image. One channel holds the SST image
and the other holds the binary occlusion mask. The input to the
discriminator, G(ˆxn)mxnin Eq.(7), is therefore replaced by
a two channel image of the form [G(ˆxn)mxn,mxn]. In the
case of a ground truth image xn, the input to the discriminator
is also replaced by [xn,mxn].
If a restored image G(ˆxn)has fake occlusion, fake occlu-
sion can be generated anywhere regardless of mxn. Therefore,
when fake occlusion is generated, in [G(ˆxn)mxn,mxn],
the occluded region of G(ˆxn)mxndoes not correspond to
the second channel mxn. In contrast, the occluded region of
a ground truth image xnexactly corresponds to mxn. Hence
this two channel setting prevents the generator from generating
fake occlusion in the restored image.
C. Training Phase
Updating of the parameters of our network uses both our
reconstruction loss, Lours
rec , and our adversarial loss, Lours
adv . Our
optimization function is as follows;
min
Gmax
D(αLours
rec + (1 α)Lours
adv )(8)
where α(0α1) is a weight parameter. In Eq.(8), the
parameters of G and D are updated in an alternating manner.
Since we do not need to restore the non-occluded region
of the ground truth images, such region is used without
modification. The restoration process for G, i.e. G(ˆxn)is
therefore implemented as follows;
G(ˆxn) = f(ˆxn)(1mˆxn) + ˆxnmˆxn(9)
m(i)
ˆxn={0 (ˆx(i)
nis occluded)
1 (ˆx(i)
nis not occluded) mˆxn=mrandmxn
(10)
where ftakes partially occluded images as input, and outputs
the whole restored images. The parameters of fare updated
during training. mˆxnis an occlusion mask of ˆxnin which
occluded regions are made by mrand and mˆxn(Eq.(10)).
Restored images by Gconsist of the restored region of ˆxn
and the non-occluded region of ˆxn(Eq.(9)).
For the architecture of the generator, skip-connection and
ResNet[24] are helpful to improve high performance[10][23].
Therefore in our networks, we employed the same architecture
as that in [23].
D. Restoration of SST Images
Suppose we restore a partially occluded SST image, ˜
y,
we can get the restored result yby only inputting it to
the generator which has already been trained. This can be
expressed by Eq.(11).
y=G(˜
y)(11)
IV. EXPERIMENTS
Experiments were conducted to evaluate the effectiveness
of our method.
A. Dataset and Training Details
We prepared daily satellite SST images observed by
Himawari-8 satellite[25] from July, 2015 to August, 2016(418
days) for the experiment. Each image of them had 5001×6001
pixels, with each pixel corresponding to a 4km2region.
To construct a dataset, we cropped 64×64 pixel regions
from these SST images. At least a 64×64 pixel region is
supposed to be necessary to grasp SST features such as
vortex. Since patterns of SST are affected by ocean currents
and geography of sea floor, a single model should not be
constructed by using all the SST images. In the area; latitudes
150E to 180E and longitudes 1N to 30N, only north
equatorial current can be observed and water depth is deep
enough. The cropped images were therefore taken from only
this area in this experiment.
For all the cropped images, we divided them to three types
of dataset;
Occlusion free dataset: All the non-occluded SST im-
ages(269 images) from the cropped images.
Small occlusion dataset: This dataset contains 590
cropped images whose occluded region was limited less
than 1%.
Large occlusion dataset: The rest of the cropped images
were contained in this dataset. Since too large occluded
region depressed performance, occlusion rate of the im-
ages in this dataset was limited less than 60%(163,505
images), and images having more than 70% occluded
region were eliminated. The average occlusion rate of
the images in this dataset was 20.7%.
The use of those dataset can be summarized as below;
We used occlusion free dataset to train the conventional
method[6] which requires non-occluded training images.
We made ground truth and counterpart partially occluded
test image pairs using small occlusion dataset. The im-
ages in small occlusion dataset were used as ground
truth images. To generate partially occluded test images,
we also prepared 590 real cloud occlusion binary masks
extracted from the satellite SST images, and applied them
to the ground truth images. Partially occluded test images
were restored by each method, and the restored results
were evaluated being compared with the counterpart
ground truth images.
Our method allows training images to contain occluded
regions, therefore our method can train the networks us-
ing both occlusion free and large occlusion dataset(totally
163,774 images).
The conventional method[6] could only use occlusion free
dataset(269 image) whereas our method enlarged the amount
of available training images by using both occlusion free and
large occlusion dataset(163,774 images), which is expected to
improve the performance.
We compared seven types of inpainting methods.
Three of them were the state-of-the-art methods proposed
by Pathak et al.[6] which required non-occluded training
images. In these methods, we trained the networks by only
the reconstruction loss(Lrec), only the adversarial loss(Ladv ),
or both(αLrec + (1 α)Ladv ) respectively. We trained them
with occlusion free dataset.
Other three of them were our own methods which allowed
the training images to contain occluded regions. We trained
the network by only our reconstruction loss(Lours
rec ), only
our adversarial loss(Lours
adv ), or both(αLours
rec + (1 α)Lours
adv )
respectively. We trained them with both occlusion free and
large occlusion dataset.
The other was one of the non-learning-based inpainting
methods, NS[14], for further comparison. This method restores
images using only the nearest pixel value of occluded region.
In this experiment, we set the learning rate of the discrim-
inator to 108, and that of the generator was set to 104.
Adam[26] optimizer was used with β1= 0.5during training.
For αLrec + (1 α)Ladv and αLours
rec + (1 α)Lours
adv ,αwas
set to 0.5.
All networks were trained in the same manner as de-
scribed above. We trained these networks on NVIDIA GeForce
GTX1080 GPU.
For simplicity, we denote αLrec +(1α)Ladv and αLour s
rec +
(1 α)Lours
adv as Lrec +Ladv and Lours
rec +Lours
adv respectively.
B. Evaluation Metric
Two evaluation metric were used.
First, restored SST images should be similar to correspond-
ing ground truth images. This was measured by mean squared
error(MSE) between restored images and ground truth images.
MSE was calculated only in occluded region of partially
occluded test images.
In industrial applications, restored SST images should be
sharp enough to grasp the characteristics of SST images. For
this, a qualitative comparison was made between restored
images and ground truth images.
C. Experimental Results
The quantitative results are shown in TABLE I. They con-
firm that our methods(Lours
rec and Lours
rec +Lours
adv ) outperformed
other conventional methods in every occlusion rate. In general,
as occlusion rate increases, restoration becomes more difficult.
Our methods were particularly effective under such conditions
due to large amount of training images.
Qualitative comparisons are shown in Fig.4. Again, the
superiority of our proposed methods was particularly pro-
nounced when the occlusion rate was high. Additionally, Lours
adv
could alleviate unrealistic oversmoothing as well as Ladv, and
making it easy to grasp patterns of SST.
D. Restoration of an Entire SST Image
Finally, we restored an entire SST image, which was taken
by Himawari-8 satellite at 18:00(GMT), September 16, 2015.
The image is shown in Fig.1(a), and its restoration in Fig.1(b).
We used our method in the same manner as the first
experiment. To train the networks, we divided an entire image
to 6 areas, and prepared partially occluded training images in
each area. For each area, we constructed a restoration model
independently using Lours
rec +Lours
adv . Those models restored
iteratively 64×64 pixel sized regions, occlusion rate of which
was limited less than 20 %, until the entire image(5001×6001
pixel) had been restored.
Our method took only 3 hours for this experiment with a
consumer computer, Xeon CPU E5-2620 v4 2.10GHz with
64GB memory and NVIDIA GeForce GTX1080 GPU. This
restoration time is sufficiently fast for industrial real-time
applications, according to the users of fishery catch estimation.
V. CONCLUSION
In this paper, we discussed the method for the restoration
of SST images.
As most SST images include occluded regions, it is hard to
gather enough amount of non-occluded training images when
using deep neural network based inpainting methods. Hence
we proposed the novel inpainting method which allow the
training images to contain occluded regions for the restoration
of SST images. For this purpose, we modified the reconstruc-
tion loss and the adversarial loss of conventional inpainting
method[6]. Our losses allowed training images to contain
partially occluded regions, and this modification enlarged the
amount of available training images. In the experiment, our
method outperformed conventional methods taking advantage
of [6].
Since SST images are time-sequential data, temporal clue
may also be helpful to restore SST images. In our future work,
we will attempt to improve the accuracy by expanding this
method to time-sequential data.
REFERENCES
[1] N. Usui, S. Ishizaki, Y. Fujii, H. Tsujino, T. Yasuda, and M. Kamachi,
“Meteorological research institute multivariate ocean variational estima-
tion (move) system: Some early results, Advances in Space Research,
vol. 37, no. 4, pp. 806–822, 2006.
[2] F. J. Wentz, C. Gentemann, D. Smith, and D. Chelton, “Satellite
measurements of sea surface temperature through clouds,” Science, vol.
288, no. 5467, pp. 847–850, 2000.
[3] R. W. Reynolds, T. M. Smith, C. Liu, D. B. Chelton, K. S. Casey, and
M. G. Schlax, “Daily high-resolution-blended analyses for sea surface
temperature,” Journal of Climate, vol. 20, no. 22, pp. 5473–5496, 2007.
[4] D. Wilson-Diaz, A. J. Mariano, R. H. Evans, and M. E. Luther, “A
principal component analysis of sea-surface temperature in the arabian
sea,” Deep Sea Research Part II: Topical Studies in Oceanography,
vol. 48, no. 6, pp. 1097–1114, 2001.
[5] C. Penland and P. D. Sardeshmukh, “The optimal growth of tropical sea
surface temperature anomalies,” Journal of climate, vol. 8, no. 8, pp.
1999–2024, 1995.
[6] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros,
“Context encoders: Feature learning by inpainting,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2016,
pp. 2536–2544.
[7] R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. Hasegawa-Johnson,
and M. N. Do, “Semantic image inpainting with deep generative
models,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2017, pp. 5485–5493.
[8] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
deep neural networks,” in Advances in Neural Information Processing
Systems, 2012, pp. 341–349.
[9] F. Agostinelli, M. R. Anderson, and H. Lee, Adaptive multi-column
deep neural networks with application to robust image denoising, in
Advances in Neural Information Processing Systems, 2013, pp. 1493–
1501.
[10] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep
convolutional encoder-decoder networks with symmetric skip connec-
tions,” in Advances in Neural Information Processing Systems, 2016,
pp. 2802–2810.
[11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Advances in neural information processing systems, 2014, pp. 2672–
2680.
[12] A. McNally and P. Watts, “A cloud detection algorithm for high-
spectral-resolution infrared sounders,” Quarterly Journal of the Royal
Meteorological Society, vol. 129, no. 595, pp. 3411–3423, 2003.
[13] H. Ishida and T. Y. Nakajima, “Development of an unbiased cloud
detection algorithm for a spaceborne multispectral imager, Journal of
Geophysical Research: Atmospheres, vol. 114, no. D7, 2009.
[14] M. Bertalmio, A. L. Bertozzi, and G. Sapiro, “Navier-stokes, fluid
dynamics, and image and video inpainting,” in Computer Vision and
Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE
Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.
[15] A. Telea, An image inpainting technique based on the fast marching
method,” Journal of graphics tools, vol. 9, no. 1, pp. 23–34, 2004.
TABLE I: MSE(C) of restoration by each method. MSE was calculated only in occluded region of input images. Red color refers the best
precision for each Occlusion Rate, and blue color indicates the second best. Our method achieved the best and the second best
precision in every occlusion rate.
Occlusion
Rate(%)
NS[14] Lrec[6]Ladv [6]Lr ec+Ladv [6] Lours
rec Lours
adv Lours
rec +Lours
adv
120 0.1381 0.0566 0.0845 0.0594 0.0528 0.1033 0.0566
20 40 0.2131 0.1038 0.1583 0.1095 0.0960 0.1898 0.1022
40 60 0.2936 0.1525 0.2204 0.1554 0.1369 0.2355 0.1438
60 80 0.3046 0.1810 0.2467 0.1773 0.1541 0.2955 0.1605
80 100 0.2904 0.2304 0.2323 0.2065 0.1931 0.2868 0.1913
1100 0.2224 0.1211 0.1658 0.1207 0.1071 0.1930 0.1117
Ground Truth Input NS[14] Lrec[6] Ladv [6] Lr ec+Ladv [6] Lours
rec Lours
adv Lours
rec +Lours
adv
Fig. 4: Comparison with conventional methods. White pixels correspond to occluded regions. In the case of low occlusion rate, all methods
produced accurate restoration. When the occluded region was large, Lours
rec +Lours
adv produced superior results. Lours
adv alleviated over
smoothness of Lours
rec .
[16] R. Mart´
ınez-Noriega, A. Roumy, and G. Blanchard, “Exemplar-based
image inpainting: Fast priority and coherent nearest neighbor search,”
in Machine Learning for Signal Processing (MLSP), 2012 IEEE Inter-
national Workshop on. IEEE, 2012, pp. 1–6.
[17] J. Hays and A. A. Efros, “Scene completion using millions of pho-
tographs,” in ACM Transactions on Graphics (TOG), vol. 26, no. 3.
ACM, 2007, p. 4.
[18] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse rep-
resentations for image and video restoration,” Multiscale Modeling &
Simulation, vol. 7, no. 1, pp. 214–241, 2008.
[19] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,” arXiv
preprint arXiv:1511.06434, 2015.
[20] E. L. Denton, S. Chintala, R. Fergus et al., “Deep generative image
models using a laplacian pyramid of adversarial networks, in Advances
in neural information processing systems, 2015, pp. 1486–1494.
[21] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image
translation with conditional adversarial networks,” arXiv preprint
arXiv:1611.07004, 2016.
[22] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
translation using cycle-consistent adversarial networks, arXiv preprint
arXiv:1703.10593, 2017.
[23] C. Ledig, L. Theis, F. Husz´
ar, J. Caballero, A. Cunningham, A. Acosta,
A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single
image super-resolution using a generative adversarial network, arXiv
preprint arXiv:1609.04802, 2016.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[25] Y. Kurihara, H. Murakami, and M. Kachi, “Sea surface temperature from
the new japanese geostationary meteorological himawari-8 satellite,
Geophysical Research Letters, vol. 43, no. 3, pp. 1234–1240, 2016.
[26] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,
arXiv preprint arXiv:1412.6980, 2014.
... Grohnfeldt et al. [42] used conditional GANs to combine Sentinel-1 SAR and Sentinel-2 optical images, generating clear cloudfree optical data. Shibata et al. [43] leveraged reconstruction and adversarial losses to remove cloud occlusions. Bermudez et al. [8], [44] employed conditional GANs with auxiliary SAR data to generate cloud-free optical images, effectively handling thin and thick clouds. ...
Preprint
Cloud cover can significantly hinder the use of remote sensing images for Earth observation, prompting urgent advancements in cloud removal technology. Recently, deep learning strategies have shown strong potential in restoring cloud-obscured areas. These methods utilize convolution to extract intricate local features and attention mechanisms to gather long-range information, improving the overall comprehension of the scene. However, a common drawback of these approaches is that the resulting images often suffer from blurriness, artifacts, and inconsistencies. This is partly because attention mechanisms apply weights to all features based on generalized similarity scores, which can inadvertently introduce noise and irrelevant details from cloud-covered areas. To overcome this limitation and better capture relevant distant context, we introduce a novel approach named Attentive Contextual Attention (AC-Attention). This method enhances conventional attention mechanisms by dynamically learning data-driven attentive selection scores, enabling it to filter out noise and irrelevant features effectively. By integrating the AC-Attention module into the DSen2-CR cloud removal framework, we significantly improve the model's ability to capture essential distant information, leading to more effective cloud removal. Our extensive evaluation of various datasets shows that our method outperforms existing ones regarding image reconstruction quality. Additionally, we conducted ablation studies by integrating AC-Attention into multiple existing methods and widely used network architectures. These studies demonstrate the effectiveness and adaptability of AC-Attention and reveal its ability to focus on relevant features, thereby improving the overall performance of the networks. The code is available at \url{https://github.com/huangwenwenlili/ACA-CRNet}.
... Various architectures of neural networks have been used for missing oceanographic data reconstruction in the recent years, displaying promising results, justifying the raise in popularity [12]. It was shown that the training process of the base architecture of CCGAN, generative adversarial network (GAN), can be modified to increase the reconstruction accuracy of missing oceanographical data, however, the work has only been applied to sea surface temperature [15], [16], [17], [18]. On the other hand, CCGAN has successfully been applied to missing chlorophyll a data [19], showing promising results, indicating that it may be used on other oceanographic variables, including radial sea surface currents. ...
Article
Full-text available
Surface currents can be accurately measured remotely using high-frequency radars, with the drawback that those measurements are susceptible to external interference resulting in frequent gaps in data. In this article, we compare the gap-filling accuracy of four pattern recognition machine learning methods—k-means clustering, self-organizing maps, growing neural gas, and a generative adversarial network. Several dozen experiments are demonstrated using data from two different radars, exploring the possibilities of applications of feature engineering to reduce the dimensionality of the problem. Findings indicate how classical pattern recognition algorithms result in an average relative error of around 5%–10 %\% , while the generative adversarial network decreased that error, with significantly increased correlation.
... Moreover, it is also computationally demanding and sensitive to outliers. In recent years, deep learning-based inpainting techniques have emerged as a groundbreaking approach to reconstructing missing data in climate datasets [12][13][14][15][16][17][18] . As epitomized by the work of ref. 15 on the reconstruction of the HadCRUT4 dataset 19 , artificial neural networks can significantly outperform traditional infilling methods on relevant metrics, such as the root mean square error or the spatial correlation. ...
Article
Full-text available
The understanding of recent climate extremes and the characterization of climate risk require examining these extremes within a historical context. However, the existing datasets of observed extremes generally exhibit spatial gaps and inaccuracies due to inadequate spatial extrapolation. This problem arises from traditional statistical methods used to account for the lack of measurements, particularly prevalent before the mid-20th century. In this work, we use artificial intelligence to reconstruct observations of European climate extremes (warm and cold days and nights) by leveraging Earth system model data from CMIP6 through transfer learning. Our method surpasses conventional statistical techniques and diffusion models, showcasing its ability to reconstruct past extreme events and reveal spatial trends across an extensive time span (1901-2018) that is not covered by most reanalysis datasets. Providing our dataset to the climate community will improve the characterization of climate extremes, resulting in better risk management and policies.
... Data-driven image inpainting is used to repair image damage caused by raindrops, to improve the quality of old images, 40 or to increase the resolution of low-quality images (Yu et al., 2018;Liu et al., 2018;Elharrouss et al., 2020), but also to fill gaps in climate data. For example, Shibata et al. (2018) use inpainting to reconstruct incomplete satellite images of sea surface measurements from the rain gauges. To provide timely information, esp. ...
Preprint
Full-text available
Incomplete spatio-temporal meteorological observations can result in misinterpretations of the current climate state, uncertainties in early warning systems, or inaccuracies in nowcasting models and can thereby pose signficant challanges in hydrology research or similar applications. Traditional statistical methods for infilling missing precipitation data demand substantial computational resources and fail over large areas with sparse data – like temporary outages of weather radars. Although recent machine learning advancements have shown promise in addressing missing meteorological or satellite observations, they typically focus on spatial aspects, overlooking the complex spatio-temporal variability characteristic of precipitation, especially during extreme events. We propose a deep convolutional neural network enhanced with a temporal memory component to better account for temporal changes in precipitation fields. This approach can analyse arbitrary sequences from before and/or after the incomplete observation of interest. Our model is trained and evaluated on the hourly RADKLIM dataset, which features 1-km resolution precipitation derived from combined radar and weather station data across Germany. By infilling both synthetic and actual data gaps of RADKLIM, the study demonstrates the model's effectiveness, providing detailed insights into its capabilities during significant rainfall events, such as those in May 2012 and July 2021, including those responsible for the Ahrtal flood. This novel approach represents a step forward in hydrological applications, potentially improving the way we predict and manage water-related events by increasing the accuracy and reliability of precipitation data analysis.
... To examine the validity of non-linear approaches, four different machine learning techniques were used to reconstruct missing HFR sea surface current data; k-means clustering (kmeans) [5], self-organising maps (SOM) [6], growing neural gas (GNG) [7] and a type of architecture of neural network called Context-Conditional Generative Adversarial Network (CCGAN) [8]. All of these techniques have been successfully utilised in the past to reconstruct missing oceanographic remote sensing data [9]- [14]. This paper serves as a comparison of the aforementioned techniques when applied to sparse radial data obtained from a single HFR. ...
... This way, based on the feedback information, both networks become more proficient in their respective tasks, resulting in the generation of realistic data [12]. GANs have been successfully utilised within the domain of satellite oceanography for reconstruction purposes [13][14][15][16], but their application has been limited to sea surface temperature (SST) only [11]. Outside of strictly reconstruction-oriented purposes, GANs have seen a vast range of utilisation in oceanography [17][18][19][20]. ...
Article
Full-text available
This work represents a modification of the Context Conditional Generative Adversarial Network as a novel implementation of a non-linear gap reconstruction approach of missing satellitederived chlorophyll a concentration data. By adjusting the loss functions of the network to focus on the structural credibility of the reconstruction, high numerical and structural reconstruction accuracies have been achieved in comparison to the original network architecture. The network also draws information from proxy data, sea surface temperature, and bathymetry, in this case, to improve the reconstruction quality. The implementation of this novel concept has been tested on the Adriatic Sea. The most accurate model reports an average error of 0.06 mg m−3 and a relative error of 3.87%. A non-deterministic method for the gap-free training dataset creation is also devised, further expanding the possibility of combining other various oceanographic data to possibly improve the reconstruction efforts. This method, the first of its kind, has satisfied the accuracy requirements set by scientific communities and standards, thus proving its validity in the initial stages of conceptual utilisation.
... The generator captures the data distribution and generates real data, while the discriminator estimates the probability that the data come from the real data space. Dong et al. [20] and Shibata et al. [21] utilized GAN to restore sea surface temperature (SST) satellite images to deal with the problem of cloud occlusion. Dewi et al. [22] reconstructed cloud vertical structure with the GAN, demonstrating the feasibility of GAN to solve problems in atmospheric remote sensing.However, it is difficult to guarantee the optimal convergence of the generator and discriminator at the same time because their optimization goals are different. ...
Article
Full-text available
Understanding the influence of the Antarctic on the global climate is crucial for the prediction of global warming. However, due to very few observation sites, it is difficult to reconstruct the rational spatial pattern by filling in the missing values from the limited site observations. To tackle this challenge, regional spatial gap-filling methods, such as Kriging and inverse distance weighted (IDW), are regularly used in geoscience. Nevertheless, the reconstructing credibility of these methods is undesirable when the spatial structure has massive missing pieces. Inspired by image inpainting, we propose a novel deep learning method that demonstrates a good effect by embedding the physics-aware initialization of deep learning methods for rapid learning and capturing the spatial dependence for the high-fidelity imputation of missing areas. We create the benchmark dataset that artificially masks the Antarctic region with ratios of 30%, 50% and 70%. The reconstructing monthly mean surface temperature using the deep learning image inpainting method RFR (Recurrent Feature Reasoning) exhibits an average of 63% and 71% improvement of accuracy over Kriging and IDW under different missing rates. With regard to wind speed, there are still 36% and 50% improvements. In particular, the achieved improvement is even better for the larger missing ratio, such as under the 70% missing rate, where the accuracy of RFR is 68% and 74% higher than Kriging and IDW for temperature and also 38% and 46% higher for wind speed. In addition, the PI-RFR (Physics-Informed Recurrent Feature Reasoning) method we proposed is initialized using the spatial pattern data simulated by the numerical climate model instead of the unified average. Compared with RFR, PI-RFR has an average accuracy improvement of 10% for temperature and 9% for wind speed. When applied to reconstruct the spatial pattern based on the Antarctic site observations, where the missing rate is over 90%, the proposed method exhibits more spatial characteristics than Kriging and IDW.
... 105,106,115,[120][121][122][123][124]134,137,139,147,163,192]. ...
Article
Full-text available
Oceanographic parameters, such as sea surface temperature, surface chlorophyll-a concentration, sea surface ice concentration, sea surface height, etc., are listed as Essential Climate Variables. Therefore, there is a crucial need for persistent and accurate measurements on a global scale. While in situ methods tend to be accurate and continuous, these qualities are difficult to scale spatially, leaving a significant portion of Earth’s oceans and seas unmonitored. To tackle this, various remote sensing techniques have been developed. One of the more prominent ways to measure the aforementioned parameters is via satellite spacecraft-mounted remote sensors. This way, spatial coverage is considerably increased while retaining significant accuracy and resolution. Unfortunately, due to the nature of electromagnetic signals, the atmosphere itself and its content (such as clouds, rain, etc.) frequently obstruct the signals, preventing the satellite-mounted sensors from measuring, resulting in gaps—missing data—in satellite recordings. One way to deal with these gaps is via various reconstruction methods developed through the past two decades. However, there seems to be a lack of review papers on reconstruction methods for satellite-derived oceanographic variables. To rectify the lack, this paper surveyed more than 130 articles dealing with the issue of data reconstruction. Articles were chosen according to two criteria: (a) the article has to feature satellite-derived oceanographic data (b) gaps in satellite data have to be reconstructed. As an additional result of the survey, a novel categorising system based on the type of input data and the usage of time series in reconstruction efforts is proposed.
Article
Sea surface temperature (SST) serves as a critical indicator of oceanic environmental changes. However, the uninterrupted observation of vast oceanic areas via remote sensing is frequently impeded by cloud cover, resulting in persistent data gaps. Consequently, the completion of SST data emerges as an essential technique. Recently, the utilization of deep neural networks, particularly generative models, has shown promising results by effectively leveraging historical data for training. However, these approaches often concentrate on spatial domain features, neglecting the inherent chaotic nature of the ocean as a complex system, thus failing to accurately reflect the complexity of oceanic processes. Therefore, this study introduces a novel approach to SST completion focusing on frequency domain features, utilizing Fourier neural operators. We propose an innovative Data INpainting Fourier Neural Network, namely DINFNN, for data completion to facilitate feature learning in complex oceanic systems. Our method employs a triple-stream neural network to capture periodic steady-state features, adjacent temporal features, and current context. By integrating high-pass and low-pass filters and dynamically combining them, we adaptively extract essential frequency domain features for completion. Particularly, a Two-Cascaded Fusion module is utilized dynamically to amalgamate these distinct attributes to form a composite feature set, encompassing both steady and dynamic frequency domain elements, aimed at achieving a comprehensive SST field reconstruction. In our experimental evaluations, our method demonstrates significant improvements under various conditions. Specifically, when the cover ratio is 68% and the signal-to-noise ratio (N/S) is set to 0.1, 0.2, and 0.3, we observe enhancements of 11.2%, 10.8%, and 16.2% in R 2 , respectively. Furthermore, corresponding reductions of 7.2%, 13.1%, and 7.3% in RMSE are achieved, respectively. Moreover, comprehensive ablation experiments confirm the effectiveness of each component within our method and emphasize the superiority of DINFNN over conventional operators.
Article
Cloud cover can significantly hinder the use of remote sensing images for Earth observation, prompting urgent advancements in cloud removal technology. Recently, deep learning strategies, especially convolutional neural networks (CNNs) with attention mechanisms, have shown strong potential in restoring cloud-obscured areas. These methods utilize convolution to extract intricate local features and attention mechanisms to gather long-range information, improving the overall comprehension of the scene. However, a common drawback of these approaches is that the resulting images often suffer from blurriness, artifacts, and inconsistencies. This is partly because attention mechanisms apply weights to all features based on generalized similarity scores, which can inadvertently introduce noise and irrelevant details from cloud-covered areas. To overcome this limitation and better capture relevant distant context, we introduce a novel approach named attentive contextual attention (AC-Attention). This method enhances conventional attention mechanisms by dynamically learning data-driven attentive selection scores, enabling it to filter out noise and irrelevant features effectively. By integrating the AC-Attention module into the DSen2-CR cloud removal framework, we significantly improve the model’s ability to capture essential distant information, leading to more effective cloud removal. Our extensive evaluation of various datasets shows that our method outperforms existing ones regarding image reconstruction quality. In addition, we conducted ablation studies by integrating AC-Attention into multiple existing methods and widely used network architectures. These studies demonstrate the effectiveness and adaptability of AC-Attention and reveal its ability to focus on relevant features, thereby improving the overall performance of the networks. The code is available at https://github.com/huangwenwenlili/ ACA-CRNet.
Article
Full-text available
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X -> Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y -> X and introduce a cycle consistency loss to push F(G(X)) \approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.