PreprintPDF Available

Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional layers. Additionally, we facilitate the generalization with a bag of training strategies: mixup data augmentation, self-distillation, and feature-wise linear modulation (FiLM). Presented modifications outperform the baseline algorithms (3D U-Net) by up to 19.54% with less than 1% additional parameters, which won the 4th place in the core test leaderboard.
Content may be subject to copyright.
Region-Conditioned Orthogonal 3D U-Net
for Weather4Cast Competition
Taehyeon Kim Shinhwan Kang Hyeonjeong Shin Deukryeol Yoon
Seongha Eom Kijung Shin Se-Young Yun
KAIST AI
Seoul, Korea
{potter32, shinhwan.kang, hyeonjeong1, deukryeol.yoon}@kaist.ac.kr
{doubleb, kijungs, yunseyoung}@kaist.ac.kr
Abstract
The Weather4Cast competition (hosted by NeurIPS 2022) required competitors
to predict super-resolution rain movies in various regions of Europe when low-
resolution satellite contexts covering wider regions are given. In this paper, we
show that a general baseline 3D U-Net can be significantly improved with region-
conditioned layers as well as orthogonality regularizations on 1
×
1
×
1 convolutional
layers. Additionally, we facilitate the generalization with a bag of training strategies:
mixup data augmentation, self-distillation, and feature-wise linear modulation
(FiLM). Presented modifications outperform the baseline algorithms (3D U-Net)
by up to
19.54%
with less than 1% additional parameters, which won the 4th place
in the core test leaderboard.
1 Introduction
Precipitation forecasting is one of the most arduous problem in forecasting the meteorological
conditions such as air quality, solar, temperature, and wind velocity. Accurate forecasting can prevent
enormous economic and social damages from a variety of applications: large-scale crop management,
autonomous driving systems, and air traffic control. While Numerical Weather Prediction (NWP)
is a general method for predicting the climate changes based on the calculation of physics-based
simulations, its performance for short-term rain prediction (i.e., less than 6 hours) is still inaccurate
despite lots of computational efforts. Recently, deep learning techniques have attracted huge attention
from the weather research community for such short-term precipitation forecasting [
9
,
10
,
13
,
14
,
5
].
Specifically, among these techniques, Espeholt et al.
[5]
develop an end-to-end deep learning method
that outperforms High Resolution Rapid Refresh (HRRR) [
3
], which is the start-of-the-art method
used in United States.
Weather4Cast 2022 [
6
,
7
] is a competition for designing the best deep learning-based precipitation
forecasting model where competitors attempted to forecast super-resolution rainfall events for the next
8 hours at 15-minute intervals from low-resolution satellite radiances over various regions in Europe.
For the stage2 task in which our team participated, the desired model for competitors is to predict the
7 Europe regions across two years (2019 and 2020) when the training dataset is composed of spectral
satellite imagery which covers larger areas with low-resolutions having 11 input variables for each
pixel. While the satellite imagery data demands for the spatio-temporal modelling, the studies for
conventional methods are still under-explored for being robust towards such spatio-temporal shifts.
To tackle the challenge of spatio-temporal shifts, we propose a
R
egion
C
onditioned
N
etwork (RCN)
to inject the regional information into the output of 3D residual U-Net’s encoder architecture, which
is the variation of 3D U-Net [
4
]. With given spectral satellite contexts of different regions, RCN
can extract the region-conditioned context and such contexts linearly modulate the output of the 3D
36th Weather4cast NeurIPS 2022Competition Workshop
arXiv:2212.02059v1 [cs.CV] 5 Dec 2022
U-Net. In addition, by penalizing the orthogonality regularization for the 1
×
1
×
1 convolutional layer,
the network can capture more fined-grained representations for the super-resolution prediction so that
it yields the better score. We also stabilize the training from the spatio-temporal shifts of the dataset.
Lastly, we adapt a bag of training strategies such as mixup [
15
], self-distillation, and feature-wise
linear modulation (FiLM) [
2
,
11
]. More precisely, we add FiLM layers to a backbone model for
fine-tuning the layers for each region of each year while freezing other backbones except the FiLM
layers. We provide more details about RCN and training strategies in Section 3.
Our contributions can be summarized as follows:
Effective:
We utilize two concepts: (1) region-conditioned network and (2) orthogonality reg-
ularization on 1
×
1
×
1 convolutional layers. With additional training strategies, our solution
outperforms a baseline up to 19.54% with less than 1% additional parameters.
Applicable: Our approaches can be adapted to any other deep neural networks.
Reproducible: We provide our source code at [1].
2 Overview of Weather4Cast Challenge and Provided Data
2.1 Weather4Cast Challenge
The main objective of Weather4Cast competition is to predict future super-resolution rainfall events
(i.e., rain or no-rain) from lower-resolution satellite radiance. In this competition, competitors are
required to provide a model predicting rainfall events until eight hours in 32 time slots for given 4
time slots of a proceeding hour. As the given data is composed of multiple regions in Europe across
two years, a key is to learn the robust model under spatio-temporal shifts.
The challenge comprised two different tasks: (1)
stage1
: predicting 3 different regions for 1 year
(2019) and (2)
stage2
: 7 different regions for 2 years (2019 and 2020). Additionally, the rain rate
threshold for the latter task is 0.2 while the former one is 0.0001. The solution for the stage2 task can
bring beneficial meteorological meanings while it is a harder challenge which increases the sparsity
of the rain events to be predicted. In this paper, our solution targets for the stage2 task.
2.2 Dataset
The dataset is provided with satellite imagery including 11 observed physics-information, positional
information, and observed rainfall amounts. The detailed explanations are as follows:
Regions:
The dataset consists of 3 different regions for 1 year (2019) in stage1, and it is extended
to 7 regions for 2 years (2019 and 2020) in stage2.
Input variables:
Each spectral satellite imagery include 11 variables which are slightly noisy
satellite radiances covering visible, water vapor, and infrared bands: IR_016, IR_039, IR_087,
IR_097, IR_108, IR_120, IR_134, VIS006, VIS008, WV_062, and WV_073. Detailed information
for each context is not provided.
Sequential information:
Each input image covers 15 minutes where each pixel corresponds to
12km ×12km area while each pixel for the output indicates 2km ×2km area.
Rainfall amount:
Pixel-wise rainfall information is provided as a float value. The precipitation
ratio is in Table 1.
Static information:
Metadata contains the information of latitude, longitude, and height for each
pixel.
2.3 Evaluation Metrics
Table 1: Statistics over different regions:
boxi0015, boxi0034, and boxi0076.
Region boxi0015 boxi0034 boxi0076
No-rain 0.810 0.810 0.892
Rain 0.190 0.190 0.108
We evaluate the predictive performance in terms of
Critical Success Index (CSI) score [
12
], F1-score,
accuracy, and Intersection over Union (IoU). In par-
ticular, CSI-score is the common evaluation metric
in precipitation forecasting. It is the total number
2
Residual U-Net
Orthogonal
1x1x1 Convolution
Region-Conditioned
Context
Mixup
Self-Distillation
FiLM
Transf er Thresholding
Step 1. Train the Backbone Step 2. Fine-Tun e Step 3. Predict
Figure 1: An overview of our solution in the task of predicting the 7 regions at 2019 and 2020.
of correct event forecasts divided by the sum of the total number of storm forecasts and the number
of misses, i.e.,
CSI =TP
TP +FN +FP (1)
where TP, FN, and FP are true positive, false negative, and false positive, respectively.
3 Method
This section provides the solution of KAIST AI. Overall training consists of 3 steps: (1) train
the backbone (Residual U-Net) with orthogonal 1
×
1
×
1 convolutional layers as well as region-
conditioned, (2) fine-tune the backbone with FiLM Transfer approach [
2
] for each region of a certain
year, and (3) predict the output via thresholding(Figure 1).
3.1 Baseline: 3D U-Net
We choose the baseline model as a 3D U-Net [
4
], which utilizes the same layers with the convolu-
tional encoder-decoder architecture for the volumetric segmentation task, on the region ‘boxi0015’,
‘boxi0034’, ‘boxi0076’ in 2019 with DiceBCEloss [
16
]. As Table 2 shows, the baseline performance
can be improved more as the batch size increases. Interestingly, such baseline models make fairly
accurate predictions for different regions even without the use of region information as an input.
However, to make super-resolution predictions more accurate, conditioning for regions is needed
during the propagation.
Table 2: A preliminary survey of the 3D U-Net architecture on the validation dataset. We set the
regions to ‘boxi0015’, ‘boxi0034’, and ‘boxi0076’ in 2019.
Batch size CSI-score F1-score Accuracy IoU Loss Precision Recall
16 0.3130 0.4668 0.7302 0.3130 0.7670 0.3310 0.8321
32 0.3221 0.4767 0.7499 0.3221 0.7638 0.3457 0.8043
48 0.3303 0.4934 0.7348 0.3303 0.7629 0.3454 0.8790
3.2 Region Conditioning
To inject region information into feature maps during the propagation, we propose a new
R
egion
C
onditioned
N
etwork (RCN) to generate region-conditioned context(Figure 2). RCN is a method of
adding an auxiliary region conditioner by using two layered fully-connected networks with a ReLU
activation function. Here, we transform the region categorical variables into one-hot vector. Because
the given dataset is comprised of satellite contexts for 7 different regions in Europe, the length of
one-hot vector is 7. We extract the region-conditioned contexts including scale
γ
vector and bias
β
vector as an output of RCN with a categorical input and formulate the feature map as follows:
γ, β REGION CONDITIONED NETWORK(xr)
˜xrγxr+β(2)
where
xr
is the last representation output of the encoder architecture. For the detailed computation of
the ˜xr,γis element-wisely multiplied with xrin a pointwise manner (), and βis added similarly.
3
Residual Convolutional Unit
+
+
++
+
+
+
+
+
++
+
Region
Region-Conditioned Context
Orthogonal
1x1x1 convolution
FiLM
Trans fer
Residual Tra ns -Convolutional Unit
Downsampling
Upsampling
3D Satellite
Context
Rain
Predictions
Details for FiLM layer
+
X
Scale γBias 𝜷
𝑥 "𝑥
"𝑥 = 𝛾𝑥 + 𝛽
Scale γ
Bias 𝜷
X
Figure 2: An overview of our modified U-Net architecture. Each blue box corresponds to the residual
convolutional unit and each green block denotes the residual transposed convolutional unit. During
the propagation, the region-conditioned context is added to the last output of the encoder while the
shortcut from the encoder unit to the corresponding decoder unit is transformed with orthogonal
3D 1
×
1
×
1 convolutional opertors as well as FiLM layer. The arrow denotes the propagation of a
multi-channel feature map.
3.3 Orthogonal 1×1×1 Convolution and Residual Unit
Orthogonal convolutional kernel is a class of the advanced normalization techniques to preserve
the magnitude of the propagation signal as well as to reduce the redundant features in the filter
response. Because there is the difference of resolution size between input and output, it is needed
to capture more fine-grained features from the latent representations. Inspired by the orthogonal
concept, we alleviate such issue by adding 1
×
1
×
1 convolution into the path from the encoder block
to the corresponding decoder block (Figure 2) and making those 1
×
1
×
1 convolutions soft orthogonal
with the orthogoanlity regularization, termed as Spectral Restricted Isometry Property
+
(SRIP
+
)
referring to as Kim and Yun [8], as follows:
λ
|W| X
W∈W
σ(W>WIn)(3)
where
W
is a weight matrix of 1
×
1
×
1 convolutional kernel,
W
is a set of 1
×
1
×
1 convolutional
kernel’s weight matrices, and σ(W>WIn)is the output of the power method.
u(W>WIn)v, v (W>WIn)u,
σ(W>WIn)kvk
kuk.(4)
where the vector
vRn
is randomly initialized with normal distribution. The key difference
from Kim and Yun
[8]
is whether the dimension of the penalized weight is 5D convolution or 4D
convolution. Although we can not quantitatively/qualitatively confirm the visible latent feature-
map changes through orthogonality, we observe the improvement in CSI-score performance in the
validation set.
Residual Unit.
We design a 3D-Residual U-Net, which is a variant of the baseline 3D U-
Net [
4
] (Figure 2). The main difference between the baseline and ours is the block type. We
make a shortcut for each encoder and decoder block while an 1
×
1
×
1 convolutional layer is added if
there is the difference of number of channels between input and output.
4
3.4 Data Augmentation: Mixup
Since satellite imagery datasets rarely contain rainfall data, compared to the abundant non-rainfall
data, the model is easily biased towards the majority class, i.e., non-rainfall. To mitigate the bias
on the majority class and encourage elaborated classification on the minority class, we applied
Mixup [
15
], a popular data augmentation technique. Mixup regularizes neural networks by utilizing
the convex combination of training data without large computational overhead, and the effectiveness
is proved over various image classification tasks and semantic segmentation tasks [
15
]. Generally,
mixup is utilized on 4D datasets, i.e., (
xi
,
yi
)
RB×C×H×W
where
B
is batch size,
C
is the number
of channels,
H
is the height, and
W
is the width, while it is under-explored on 5D dataset which
time-dimension is added. We formulate the augmented training data in the same manner for the
general Mixup, and the details are as follows:
˜x=λxi+ (1 λ)xj,
˜y=λyi+ (1 λ)yj,
where (
xi
,
yi
)
RB×C×T×H×W
are training data and ground truth target, (
xj
,
yj
) are randomly
shuffled data of (
xi
,
yi
), and
λ
Beta(
α
,
α
). We fixed
α= 1
after exploring some values. Interest-
ingly, as seen in Table 3, the model utilizing mixup achieved the visible performance improvement in
terms of F1-score, CSI-score, and IoU.
Table 3: Comparison of baseline and model utilizing mixup on the validation dataset.
Methods CSI-score F1-score Accuracy IoU Loss Precision Recall
Baseline (best) 0.3303 0.4934 0.7348 0.3303 0.7629 0.3454 0.8790
Mixup 0.3699 0.5397 0.7340 0.3699 0.8024 0.3877 0.8926
3.5 Self-Distillation
After training the region-conditioned backbone, it is re-trained with self-distillation, without the
supervision of ground-truth labels. Since the ground-truth observation is sparse and noisy, its training
is unstable as well as over-confident. To release this concern, we apply the self-distillation loss which
is a well-known technique for smoothing the loss landscape to lead to a flatten optima.
3.6 Feature-wise Linear Modulation (FiLM) Layer for further Fine-Tuning
As the last step, the self-distilled model is fine-tuned with each regional data for each year (i.e., we
have the 7
×
2 architectures). Because the pre-trained model can be under-performed for a specific
region (or year) while it captures the general features for all regions, the pre-trained model needs
further updates for personalization. For fine-tuning, we use the FiLM Transfer (FiT) inspired by
[
2
], which fixes the pre-trained backbone and fine-tunes only FiLM adapter layers (Figure 2). We
initialize the new learnable parameters for linear modulation: scale
γf
and bias
βf
and modify the
latent representations of the shortcut path from encoder to decoder during fine-tuning:
˜xrγfxr+βf(5)
where the detailed computation of the
˜xr
is the same with Equation
(2)
. Here, we do not use an
auxiliary network like RCN, but directly use scale and bias parameters as learnable parameters.
3.7 Thresholding
We generate the optimal precipitation output by controlling the threshold for each region across a
year. Generally, the model decides positive rain if the corresponding probability is over 0.5. However,
because all regions have different precipitation distributions, i.e., different precipitation scales, this
characteristic leads to a sub-optimal due to different scales of certainty on each region, even after
the FiLM Transfer. Thresholding approach is a ubiquitous technique to tune the prediction in a
post-processing manner [
5
]. More precisely, given
p[0,1]
, the points with probability higher
than
p
is decided as positive rain. We explore the best threshold with fixing bin into 0.1, i.e.,
5
p= 0.1,0.2,· · · ,0.9
. As a result, the relaxed threshold enhances the rain generalization across
several regions, and the best performance can be achieved with the combination of thresholds
p= 0.1, p = 0.2,
and
p= 0.4
for different regions. Especially, ‘boxi0076’ region, which has little
precipitation observations, is significantly improved.
3.8 Training Details and Leaderboard Results
Table 4 includes the value of hyperparameters used in this work. Through the combination of our
proposed methods, we can achieve more generalized score(Table 5).
Table 4: Hyperparameter settings.
Hyperparameter Optimizer Learning rate Maximum epoch Dropout rate Patience Batch Size
Value AdamW 1e-4 90 0.4 40 56
Table 5: The leaderboard score (i.e., IoU) of our solution for stage2 compared to the baseline score
submitted by the organizer.
Task Region boxi0015 boxi0034 boxi0076 roxi0004
Year 2019 2020 2019 2020 2019 2020 2019 2020
Core Test Baseline 0.223 0.163 0.149 0.298 0.268 0.070 0.204 0.296
Ours 0.270 0.237 0.200 0.362 0.294 0.104 0.234 0.321
Core Heldout Baseline 0.299 0.193 0.243 0.238 0.155 0.369 0.272 0.251
Ours 0.328 0.210 0.279 0.237 0.166 0.328 0.294 0.311
Task Region roxi0005 roxi0006 roxi0007 Overall
Year 2019 2020 2019 2020 2019 2020
Core Test Baseline 0.240 0.228 0.341 0.274 0.327 0.089 0.226
Ours 0.281 0.272 0.384 0.336 0.361 0.133 0.271
Core Heldout Baseline 0.274 0.299 0.325 0.403 0.245 0.005 0.255
Ours 0.287 0.318 0.389 0.417 0.248 0.028 0.274
4 Conclusion
In this work, we propose the region-conditioned orthogonal residual U-Net model for precipitation
forecasting. The performance of the proposed model outperforms the 3D U-Net model by up to
19.54%
. Our contributions can be folded into three perspectives. Firstly, we renovate the original 3D
U-Net with effective techniques, that are region-conditioned layers and orthogonality regularization
on 1
×
1
×
1 convolutional layers. Next, the generalization capability of the model can be enhanced
by utilizing common training techniques such as mixup data augmentation, self-distillation, and
feature-wise linear modulation. Lastly, our solution can be easily reproduced, and thus our repository
can offer a great entry point to facilitate future developments in rainfall forecasting for data scientists.
These interesting approaches are not limited to 3D U-Net, but could be applied to other architectures
designed for precipitation forecasting, such as ConvLSTM and MetNet.
Acknowledgement
This work was supported by the Korea Meteorological Administration Research and Development
Program "Development of AI techniques for Weather Forecasting" under Grant (KMA2021-00121)
and Institute of Information & communications Technology Planning & Evaluation (IITP) grant
funded by the Korea government(MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School
Program(KAIST)).
6
References
[1]
Our implementation code.
https://github.com/hyeonjeong1/
22-Neurips-Competition-Baseline, 2022.
[2]
Anonymous. Fit: Parameter efficient few-shot transfer learning for personalized and federated
image classification. In Submitted to The Eleventh International Conference on Learning
Representations, 2023. URL
https://openreview.net/forum?id=9aokcgBVIj1
. under
review.
[3]
Stanley G Benjamin, Stephen S Weygandt, John M Brown, Ming Hu, Curtis R Alexander,
Tatiana G Smirnova, Joseph B Olson, Eric P James, David C Dowell, Georg A Grell, et al.
A north american hourly assimilation and model forecast cycle: The rapid refresh. Monthly
Weather Review, 144(4):1669–1694, 2016.
[4]
Özgün Çiçek, Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox, and Olaf Ronneberger.
3d u-net: learning dense volumetric segmentation from sparse annotation. In International
conference on medical image computing and computer-assisted intervention, pages 424–432.
Springer, 2016.
[5]
Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla
Bromberg, Cenk Gazen, Rob Carver, Marcin Andrychowicz, Jason Hickey, et al. Deep learning
for twelve hour precipitation forecasts. Nature communications, 13(1):1–10, 2022.
[6]
Aleksandra Gruca, Pedro Herruzo, Pilar Rípodas, Andrzej Kucik, Christian Briese, Michael K.
Kopp, Sepp Hochreiter, Pedram Ghamisi, and David P. Kreil. CDCEO’21 - First Workshop on
Complex Data Challenges in Earth Observation, page 4878–4879. Association for Computing
Machinery, New York, NY, USA, 2021. ISBN 9781450384469. URL
https://doi.org/10.
1145/3459637.3482044.
[7]
Pedro Herruzo, Aleksandra Gruca, Llorenç Lliso, Xavier Calbet, Pilar Rípodas, Sepp Hochre-
iter, Michael Kopp, and David P. Kreil. High-resolution multi-channel weather forecast-
ing first insights on transfer learning from the weather4cast competitions 2021. In 2021
IEEE International Conference on Big Data (Big Data), pages 5750–5757, 2021. doi:
10.1109/BigData52589.2021.9672063.
[8]
Taehyeon Kim and Se-Young Yun. Revisiting orthogonality regularization: A study for con-
volutional neural networks in image classification. IEEE Access, 10:69741–69749, 2022. doi:
10.1109/ACCESS.2022.3185621.
[9]
Jihoon Ko, Kyuhan Lee, Hyunjin Hwang, Seok-Geun Oh, Seok-Woo Son, and Kijung Shin.
Effective training strategies for deep-learning-based precipitation nowcasting and estimation.
Computers & Geosciences, 161:105072, 2022.
[10]
Jihoon Ko, Kyuhan Lee, Hyunjin Hwang, and Kijung Shin. Deep-learning-based precipitation
nowcasting with ground weather station data and radar data. arXiv preprint arXiv:2210.12853,
2022.
[11]
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film:
Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 32, 2018.
[12]
Joseph T Schaefer. The critical success index as an indicator of warning skill. Weather and
forecasting, 5(4):570–575, 1990.
[13]
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun
Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting.
Advances in neural information processing systems, 28, 2015.
[14]
Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and
Wang-chun Woo. Deep learning for precipitation nowcasting: A benchmark and a new model.
Advances in neural information processing systems, 30, 2017.
7
[15]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond
empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
[16]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang.
Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in
medical image analysis and multimodal learning for clinical decision support, pages 3–11.
Springer, 2018.
8
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Existing weather forecasting models are based on physics and use supercomputers to evolve the atmosphere into the future. Better physics-based forecasts require improved atmospheric models, which can be difficult to discover and develop, or increasing the resolution underlying the simulation, which can be computationally prohibitive. An emerging class of weather models based on neural networks overcome these limitations by learning the required transformations from data instead of relying on hand-coded physics and by running efficiently in parallel. Here we present a neural network capable of predicting precipitation at a high resolution up to 12 h ahead. The model predicts raw precipitation targets and outperforms for up to 12 h of lead time state-of-the-art physics-based models currently operating in the Continental United States. The results represent a substantial step towards validating the new class of neural weather models.
Article
Full-text available
Recent research in deep Convolutional Neural Networks(CNN) faces the challenges of vanishing/exploding gradient issues, training instability, and feature redundancy. Orthogonality Regularization(OR), which introduces a penalty function considering the orthogonality of neural networks, could be a remedy to these challenges but is surprisingly not popular in the literature. This work revisits the OR approaches and empirically answer the question: Even when comparing various regularizations like weight decay and spectral norm regularization, which is the most powerful OR technique? We begin by introducing the improvements of various regularization techniques, specifically focusing on OR approaches over a variety of architectures. After that, we disentangle the benefits of OR in the comparison of other regularization approaches with a connection on how they affect norm preservation effects and feature redundancy in the forward and backward propagation. Our investigations show that Kernel Orthogonality Regularization(KOR) approaches, which directly penalize the orthogonality of convolutional kernel matrices, consistently outperform other techniques. We propose a simple KOR method considering both row- and column- orthogonality, of which empirical performance is the most effective in mitigating the aforementioned challenges. We further discuss several circumstances in the recent CNN models on various benchmark datasets, wherein KOR gains more effectiveness.
Article
Full-text available
With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.
Article
Full-text available
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting.
Article
Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely-used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classification problem with three precipitation intervals and the latter as a regression problem. For these tasks, we propose to pre-train the model to predict radar images in the near future without requiring ground-truth precipitation, and we also propose the use of a new loss function for fine-tuning to mitigate the class imbalance problem. We demonstrate the effectiveness of our approach using radar images and precipitation datasets collected from South Korea over seven years. It is highlighted that our pre-training scheme and new loss function improve the critical success index (CSI) of nowcasting of heavy rainfall (at least 10 mm/hr) by up to 95.7% and 43.6%, respectively, at a 5-hr lead time. We also demonstrate that our approach reduces the precipitation estimation error by up to 10.7%, compared to the conventional approach, for light rainfall (between 1 and 10 mm/hr). Lastly, we report the sensitivity of our approach to different resolutions and a detailed analysis of four cases of heavy rainfall.
Conference Paper
High-resolution remote sensing technology for Earth Observation (EO) has radically changed how we monitor the state of our planet around the clock. An effective interpretation of the resulting complex large-scale time series adopts the best machine learning techniques from signal processing, computer vision, pattern recognition, and artificial intelligence. The First Workshop on Complex Data Challenges in Earth Observation was open to both method development and advanced applications in a wide range of related topics, including image and signal processing, gap-filling, data fusion, feature extraction, prediction of spatio-temporal features, and the detection of rules underlying the observed state transitions and causal relationships. The full agenda, featuring keynotes and a selection of high quality contributed talks is available online at www.iarai.ac.at/cdceo21.
Chapter
In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.
Conference Paper
This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. It is trained end-to-end from scratch, i.e., no pre-trained network is required. We test the performance of the proposed method on a complex, highly variable 3D structure, the Xenopus kidney, and achieve good results for both use cases.