ArticlePDF Available

Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets

Authors:

Abstract and Figures

Detecting extreme events in large datasets is a major challenge in climate science research. Current algorithms for extreme event detection are build upon human expertise in defining events based on subjective thresholds of relevant physical variables. Often, multiple competing methods produce vastly different results on the same dataset. Accurate characterization of extreme events in climate simulations and observational data archives is critical for understanding the trends and potential impacts of such events in a climate change content. This study presents the first application of Deep Learning techniques as alternative methodology for climate extreme events detection. Deep neural networks are able to learn high-level representations of a broad class of patterns from labeled data. In this work, we developed deep Convolutional Neural Network (CNN) classification system and demonstrated the usefulness of Deep Learning technique for tackling climate pattern detection problems. Coupled with Bayesian based hyper-parameter optimization scheme, our deep CNN system achieves 89\%-99\% of accuracy in detecting extreme events (Tropical Cyclones, Atmospheric Rivers and Weather Fronts
Content may be subject to copyright.
Application of Deep Convolutional Neural Networks for
Detecting Extreme Weather in Climate Datasets
Yunjie Liu
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
yunjieliu@lbl.gov
Evan Racah
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
eracah@lbl.gov
Prabhat
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
prabhat@lbl.gov
Joaquin Correa
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
joaquincorrea@lbl.gov
Amir Khosrowshahi
Nervana Systems
San Diego, CA
amir@nervanasys.com
David Lavers
Scripps Institution of
Oceanography
San Diego, CA
dlavers@ucsd.edu
Kenneth Kunkel
National Oceanic and
Atmospheric Administration
Asheville, NC
ken.kunkel@noaa.gov
Michael Wehner
Lawrence Berkeley Lab
Berkeley, CA
mfwehner@lbl.gov
William Collins
Lawrence Berkeley Lab
Berkeley, CA
wdcollins@lbl.gov
ABSTRACT
Detecting extreme events in large datasets is a major chal-
lenge in climate science research. Current algorithms for
extreme event detection are build upon human expertise in
defining events based on subjective thresholds of relevant
physical variables. Often, multiple competing methods pro-
duce vastly different results on the same dataset. Accurate
characterization of extreme events in climate simulations
and observational data archives is critical for understand-
ing the trends and potential impacts of such events in a
climate change content. This study presents the first appli-
cation of Deep Learning techniques as alternative method-
ology for climate extreme events detection. Deep neural
networks are able to learn high-level representations of a
broad class of patterns from labeled data. In this work, we
developed deep Convolutional Neural Network (CNN) clas-
sification system and demonstrated the usefulness of Deep
Learning technique for tackling climate pattern detection
problems. Coupled with Bayesian based hyper-parameter
optimization scheme, our deep CNN system achieves 89%-
99% of accuracy in detecting extreme events (Tropical Cy-
clones, Atmospheric Rivers and Weather Fronts).
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
KDD 2016 August 13-17, San Francisco, CA, USA
c
2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-2138-9. . . $15.00
DOI: 10.475/123 4
Keywords
Pattern Recognition, Deep Learning; Convolutional Neural
Network; Climate Analytics; Extreme Events
1. INTRODUCTION
Extreme climate events (such as hurricanes and heat waves)
pose great potential risk on infrastructure and human health.
Hurricane Joaquin, for example, hit Carolina in early Octo-
ber 2015, and dropped over 2 feet of precipitation in days,
resulted in severe flooding and economic loss. An important
scientific goal in climate science research is to characterize
extreme events in current day and future climate projec-
tions. However, understanding the developing mechanism
and life cycle of these events as well as future trend requires
accurately identifying such pattern in space and time. Satel-
lites acquire 10s of TBs of global data every year to provide
us with insights into the evolution of the climate system. In
addition, high resolution climate models produces 100s of
TBs of data from multi-decadal run to enable us to explore
future climate scenarios under global warming. Detecting
extreme climate events in terabytes of data presents an un-
precedented challenge for climate science.
Existing extreme climate events (e.g. hurricane) detec-
tion methods all build upon human expertise in defining
relevant events based on evaluating of relevant spatial and
temporal variables on hard and subjective thresholds. For
instance, tropical cyclones are strong rotating weather sys-
tems that are characterized by low pressure, warm temper-
ature core structures with high wind. However, there is no
universally accepted sets of criteria for what defines a trop-
ical cyclone [16]. The ”Low” Pressure and ”Warm” Temper-
ature are interpreted differently among climate scientists,
therefore different thresholds are used to characterize them.
Researchers [30, 31, 33, 32, 18, 17] have developed vari-
arXiv:1605.01156v1 [cs.CV] 4 May 2016
ous algorithms to detect tropical cyclones in large climate
dataset based on subjective thresholding of several relevant
variables (e.g. sea level pressure, temperature, wind etc.).
One of the general and promising extreme climate event
detecting software, Toolkit for Extreme Climate Analysis
(TECA) [18, 17], is able to detect tropical cyclones, extra-
tropical cyclones and atmospheric rivers. TECA utilizes the
MapReduce paradigm to find pattern in Terabytes of cli-
mate data with in hours. However, many climate extreme
events do not have a clear empirical definition that is ac-
cepted universally by climate scientists (e.g. extra-tropical
cyclone and mesoscale convective system), which precludes
the development and application of algorithms for detection
and tracking. This study attempts to search for an alterna-
tive methodology for extreme events detection by designing
a neural network based system that is capable of learning a
broad class of patterns from complex multi-variable climate
data and avoiding subjective threshold.
Recent advances in deep learning have demonstrated ex-
citing and promising results on pattern recognition tasks,
such as ImageNet Large Scale Visual Recognition Challenge
[10, 22, 28] and speech recognition [8, 3, 7, 27]. Many of
the state-of-art deep learning architectures for visual pat-
tern recognition are based on the hierarchical feature learn-
ing convolutional neural network (CNN). Modern CNN sys-
tems tend to be deep and large with many hidden layers
and millions of neurons, making them flexible in learning a
broad class of patterns simultaneously from data. AlexNet
(7 layers with 5 convolutonal layer and 2 fully connected
layer) developed by [10] provides the first end to end train-
able deep learning system on objective classification, which
achieved 15.3% top-5 classification error rate on ILSVRC-
2012 data set. On the contrary, previous best performed
non-neural network based systems achieved only 25.7% top-
5 classification error on the same data set. Shortly after that,
Simonyan and Zisserman [22] further developed AlexNet and
introduced an even deeper CNN (19 layers with 16 convolu-
tional layer and 3 fully connected layer) with smaller kernel
(filter) and achieved an impressively 6.8% top-5 classifica-
tion error rate on ILSVRC-2014 data set. Szegedy et al.[28]
introduced the “inception” neural network concept (network
includes sub-network) and developed an even deeper CNN
(22 layers) that achieved comparable classification results on
ImageNet benchmark. Build on deep CNN, Sermanet et al.
[20] introduced an integrated system of classification and de-
tection, in which features learned by convolutional layers are
shared among classification and localization tasks and both
tasks are performed simultaneously in a single network. Gir-
shick et al. [4] took a completely different approach by com-
bining a region proposal framework [29] with deep CNN and
designed the state of art R-CNN object detection system.
In this paper, we formulate the problem of detecting ex-
treme climate events as classic visual pattern recognition
problem. We then build end to end trainable deep CNN
systems, following the architecture introduced by [10]. The
model was trained to classify tropical cyclone, weather front
and atmospheric river. Unlike the ImageNet challenge, where
the training data are labeled natural images, our training
data consist of several continuous spatial variables(e.g. pres-
sure, temperature, precipitation) and are stacked together
into image patches.
2. RELATED WORK
Climate data analysis requires an array of advanced method-
ology. Neural network based machine learning approach, as
a generative analysis technique, has received much attention
and been applied to tackle several climate problems in re-
cent year. Chattopadhyay et al. [2] developed a nonlinear
clustering method based on Self Organizational Map (SOM)
to study the structure evolution of Madden–Julian oscilla-
tion (MJO). Their method does not require selecting leading
modes or intraseasonal bandpass filtering in time and space
like other methods do. The results show SOM based method
is not only able to capture the gross feature in MJO struc-
ture and development but also reveals insights that other
methods are not able to discover such as the dipole and
tripole structure of outgoing long wave radiation and dia-
batic heating in MJO. Gorricha and Costa [6] used a three
dimensional Self Organizational Map on categorizing and
visualizing extreme precipitation patterns over an island in
Spain. They found spatial precipitation patterns that tradi-
tional precipitation index approach is not able to discover,
and concluded that three dimensional Self Organizational
Map is very useful tool on exploratory spatial pattern anal-
ysis. More recently, Shi et al. [21] implemented a newly de-
veloped convolutional long short term memory (LSTM) deep
neural network for precipitation nowcasting. Trained on two
dimensional radar map time series, their system is able to
outperform the current state-of-art precipitation nowcast-
ing system on various evaluation metrics. Iglesias et al. [9]
developed a multitask deep fully connected neural network
on prediction heat waves trained on historical time series
data. They demonstrate that neural network approach is
significantly better than linear and logistic regression. And
potentially can improve the performance of forecasting ex-
treme heat waves. These studies show that neural network is
a generative method and can be applied on various climate
problems. In this study, we explore deep Convolutional Neu-
ral Network on solving climate pattern detection problem.
3. METHODS
3.1 Convolutional Neural Network
A Deep CNN is typically comprised of several convolu-
tional layers followed by a small amount of fully connected
layers. In between two successive convolutional layers, sub-
sampling operation (e.g. max pooling, mean pooling) is per-
formed typically. Researchers have argued about the neces-
sity of pooling layers, and argue that they can be simply
replaced by convolutional layer with increased strides, thus
simplify the network structure [26]. In either case, the inputs
of a CNN is (m,n,p) images, where mand nis the width
and height of an image in pixel, pis the number of color
channel of each pixel. The output of a CNN is a vector of q
probability units (class scores), corresponding to the num-
ber of categories to be classified (e.g. for binary classifier
q=2).
The convolutional layers perform convolution operation
between kernels and the input images (or feature maps from
previous layer). Typically, a convolutional layer contains k
filters (kernels) with the size (i,j,p). Where i, j is the width
and height of the filter. The filters are usually smaller than
the width mand height nof input image. palways equal
to the number of color channel of input image (e.g. a color
image has three channels: red, green, and blue). Each of
the filters is independently convolved with the input images
(or feature maps from previous layer) followed by non-linear
transformation and generates kfeature maps, which serve
as inputs for the next layer. In the process of convolution,
a dot product is computed between the entry of filter and
the local region that it is connected to in the input image
(or feature map from previous layer). The parameters of
convolutional layer are these learnable filters. The convo-
lutional layer is the feature extractor, because the kernels
slide across all the inputs and will produce larger outputs
for certain sub-regions than for others. This allows features
to be extracted from inputs and preserved in the feature
maps, which are passed on to next layer, regardless of where
the feature is located in the input. The pooling layer sub-
samples the feature maps generated from convolutional layer
over a (s,t) contiguous region, where s, t is the width and
height of the subsampling window. This results in the reso-
lution of the feature maps becoming coarser with the depth
of CNN. All feature maps are high-level representations of
the input data in CNN. The fully connected layer has con-
nections to all hidden units in previous layer. If it is the last
layer within CNN architecture, the fully connected layer also
does the high level reasoning based on the feature vectors
from previous layer and produce final class scores for image
objects.
Most of current deep neural network uses back propa-
gation as learning rule [19]. The back propagation algo-
rithm searches for minimum of loss function in weight space
through gradient descent method.It partitions the final total
loss to each of the single neuron in the network and repeat-
edly adjusts the weights of neurons whose loss is high, and
back propagate the error through the entire network from
output to its inputs.
3.2 Hyper-parameter Optimization
Training deep neural network is known to be hard [12, 5].
Effectively and efficiently train deep neural network not only
requires large amount of training data, but also requires
carefully tuning model hyper-parameters (e.g. learning pa-
rameters, regularization parameters) [24]. The parameter
tuning process, however, can be tedious and non-intuitive.
Hyper-parameter optimization can be reduced to find a set
of parameters for a network that produces the best possi-
ble validation performance. As such, this process can be
thought of as a typical optimization problem of finding a
set, x, of parameter values from a bounded set Xthat mini-
mize an objective function f(x), where xis a particular set-
ting of the hyper-parameters and f(x) is the loss for a deep
neural network with a particular set of training and testing
data as function of the hyper-parameter inputs. Training
a deep neural network is not only a costly (with respect to
time) procedure, but a rather opaque process with respect
to how the network performance varies with respect to its
hyper-parameter inputs. Because training and validating
a deep neural network is very complicated and expensive,
Bayesian Optimization (which assumes f(x) is not known,
is non-convex and is expensive to evaluate) is a well-suited
algorithm for hyper-parameter optimization for our task at
hand. Bayesian Optimization attempts to optimize f(x)
by constructing two things: a probabilistic model of f(x)
and an acquistion function that picks which point xin X
to evaluate next. The probabilistic model is updated with
Baye’s rule with a Gaussian prior. The acquisition function
suggests hyper-parameter settings or points to evaluate by
trying to balance evaluating parameter settings in regions,
where f(x) is low and points in regions where the uncer-
tainty in the probabilistic model is high. As a result the
optimization procedure attempts to evaluate as few points
as possible [1] [24].
In order to implement Bayesian Optimization, we use a
tool called Spearmint. Spearmint works by launching a
Spearmint master process, which creates a database for col-
lecting all model evaluation results. The master process then
spawns many processes, which execute training and evalua-
tion with respect to a set of hyper-parameters proposed by
the acquisition function and then report their results to the
database. From there, the master process uses the results
in the database to propose further parameter settings and
launch additional processes.
3.3 CNN Configuration
Following AlexNet [10], we developed a deep CNN which
has totally 4 learnable layers, including 2 convolutional lay-
ers and 2 fully connected layers. Each convolutional layer
is followed by a max pooling layer. The model is con-
structed based on the open source python deep learning
library NOEN. The configuration of our best performed ar-
chitectures are shown in Table 1.
The networks are shallower and smaller comparing to the
state-of-art architecture developed by [22, 28].The major
limitations for exploring deeper and larger CNNs is the lim-
ited amount of labeled training data that we can obtain.
However, a small network has the advantage of avoiding
over-fitting, especially when the amount of training data is
small. We also chose comparatively large kernels (filters) in
the convolutional layer based on input data size, even though
[22] suggests that deep architecture with small kernel (filter)
is essential for state of art performance. This is because cli-
mate patterns are comparatively simpler and larger in size
as compared to objects in ImageNet dataset.
One key feature of deep learning architectures is that it
is able to learn complex non-linear functions. The convolu-
tional layers and first fully connected layer in our deep CNNs
all have Rectified Linear Unit (ReLU) activation functions
[15] as characteristic. ReLU is chosen due to its faster learn-
ing/training character [10] as compared to other activation
functions like tanh.
f(x) = max(0, x) (1)
Final fully connected layer has Logistic activation function
as non-linearity, which also serves as classifier and outputs
a probability distribution over class labels.
f(x) = 1
1 + ex(2)
3.4 Computational Platform
We performed our data processing, model training and
testing on Edison, a Cray XC30 and Cori, a Cray XC40
supercomputing systems at the National Energy Research
Scientific Computing Center (NERSC). Each of Edison com-
puting node has 24 2.4 GHz Intel Xeon processors. Each of
Cori computing node has 32 2.3 GHz Intel Haswell proces-
sors. In our work, we mainly used single node CPU backend
of NEON. The hyper-parameter optimization was performed
on a single node on Cori with tasks fully parallel on 32 cores.
Table 1: Deep CNN architecture and layer parameters. The convolutional layer param-
eters are denoted as <filter size>-<number of feature maps>(e.g. 5x5-8). The pooling
layer parameters are denoted as <pooling window>(e.g. 2x2). The fully connected
layer parameter are denoted as <number of units>(e.g. 2).
Conv1 Pooling Conv2 Pooling Fully Fully
Tropical Cyclone 5x5-8 2x2 5x5-16 2x2 50 2
Weather Fronts 5x5-8 2x2 5x5-16 2x2 50 2
Atmospheric River 12x12-8 3x3 12x12-16 2x2 200 2
Table 2: Data Sources
Climate Dataset Time Frame Temporal Resolution Spatial Resolution
(lat x lon degree)
CAM5.1 historical run 1979-2005 3 hourly 0.23x0.31
ERA-Interim reanalysis 1979-2011 3 hourly 0.25x0.25
20 century reanalysis 1908-1948 Daily 1x1
NCEP-NCAR reanalysis 1949-2009 Daily 1x1
Table 3: Size of image patch, diagnostic variables and number of labeled dataset used
for extreme event considered in the study
Events Image Dimension Variables Total Examples
Tropical Cyclone 32x32 PSL,VBOT,UBOT,
T200,T500,TMQ,
V850,U850
10,000 +ve 10,000 -ve
Atmospheric River 148 x 224 TMQ,Land Sea Mask 6,500 +ve 6,800 -ve
Weather Front 27 x 60 2m Temp, Precip,
SLP
5,600 +ve 6,500 -ve
4. DATA
In this study, we use both climate simulations and re-
analysis products. The reanalysis products are produced by
assimilating observations into a climate model. A summary
of the data source and its temporal and spatial resolution is
listed in Table 2. Ground truth labeling of various events
is obtained via multivariate threshold based criteria imple-
mented in TECA [18, 17], and manual labeling by experts
[11, 13]. Training data comprise of image patterns, where
several relevant spatial variables are stacked together over
a prescribed region that bounds a type of event. The di-
mension of the bounding box is based domain knowledge of
events spatial extent in real word. For instance, tropical cy-
clone radius are typically with in range of 100 kilometers to
500 kilometers, thus bounding box size of 500 kilometers by
500 kilometers is likely to capture most of tropical cyclones.
The chosen physical variables are also based on domain ex-
pertise. The prescribed bounding box is placed over the
event. Relevant variables are extracted within the bound-
ing box and stacked together. To facilitate model train-
ing, bounding box location is adjusted slightly such that all
of events are located approximately at the center. Image
patches are cropped and centered correspondingly. Because
of the spatial dimension of climate events vary quite a lot
and the spatial resolution of source data is non-uniform, fi-
nal training images prepared differ in their size among the
three types of event. A summary of the attributes of training
images is listed in Table 3.
5. RESULTS AND DISCUSSION
Table 4 summarizes the performance of our deep CNN ar-
chitecture on classifying tropical cyclones, atmospheric rivers
and weather fronts. We obtained fairly high accuracy (89%-
99%) on extreme event classification. In addition, the sys-
tems do not suffer from over-fitting. We believe this is
mostly because of the shallow and small size of the architec-
ture (4 learnable layers) and the weight decay regularization.
Deeper and larger architecture would be inappropriate for
this study due to the limited amount of training data. Fairly
good train and test classification results also suggest that the
deep CNNs we developed are able to efficiently learn repre-
sentations of climate pattern from labeled data and make
predictions based on feature learned. Traditional threshold
based detection method requires human expert carefully ex-
amine the extreme event and its environment, thus come
up with thresholds for defining the events. In contrast, as
shown in this study, deep CNNs are able to learn climate
pattern just from the labeled data, thus avoiding subjective
thresholds.
Table 4: Overall Classification Accuracy
Event Type Train Test Train
time
Tropical Cyclone 99% 99% 30 min
Atmospheric River 90.5% 90% 6-7 hour
Weather Front 88.7% 89.4% 30 min
5.1 Classification Results for Tropical Cyclones
Tropical cyclones are rapid rotating weather systems that
are characterized by low pressure center with strong wind
circulating the center and warm temperature core in upper
troposphere. Figure 1 shows examples of tropical cyclones
simulated in climate models, that are correctly classified by
deep CNN (warm core structure is not shown in this figure).
Tropical cyclone features are rather well defined, as can be
seen from the distinct low pressure center and spiral flow
of wind vectors around the center. These clear and distinct
characteristics make tropical cyclone pattern relatively easy
to learn and represent within CNN. Our deep CNNs achieved
nearly perfect (99%) classification accuracy.
Figure 2 shows examples of tropical cyclones that are mis-
classified. After carefully examining these events, we believe
they are weak systems (e.g. tropical depression), whose low
pressure center and spiral structure of wind have not fully
developed. The pressure distribution shows a large low pres-
sure area without a clear minimum. Therefore, our deep
CNN does not label them as strong tropical cyclones.
Table 5: Confusion matrix for tropical cyclone classification
Label TC Label Non TC
Predict TC 0.989 0.003
Predict Non TC 0.011 0.997
Figure 1: Sample images of tropical cyclones correctly clas-
sified (true positive) by our deep CNN model. Figure shows
sea level pressure (color map) and near surface wind distri-
bution (vector solid line).
Figure 2: Sample images of tropical cyclones mis-classified
(false negative) by our deep CNN model. Figure shows sea
level pressure (color map) and near surface wind distribution
(vector solid line).
5.2 Classification Results for Atmospheric Rivers
In contrast to tropical cyclones, atmospheric rivers are dis-
tinctively different events. They are narrow corridors of con-
centrated moisture in atmosphere. They usually originate
in tropical oceans and move pole-ward. Figure 3 shows ex-
amples of correctly classified land falling atmospheric rivers
that occur on the western Pacific Ocean and north Atlantic
Ocean. The characteristics of narrow water vapor corridor
is well defined and clearly observable in these images.
Figure 4 are examples of mis-classified atmospheric rivers.
Upon further investigation, we believe there are two main
factors leading to mis-classification. Firstly, presence of
weak atmospheric river systems. For instance, the left col-
umn of Figure 4 shows comparatively weak atmospheric
rivers. The water vapor distribution clearly show a band of
concentrated moisture cross mid-latitude ocean, but the sig-
nal is much weaker comparing to Figure 3. Thus, deep CNN
does not predict them correctly. Secondly, the presence of
other climate event may also affect deep CNN representa-
tion of atmospheric rivers. In reality, the location and shape
of atmospheric river are affected by jet streams and extra-
tropical cyclones. For example, Figure 4 right column shows
rotating systems (likely extra-tropical cyclone) adjacent to
the atmospheric river. This phenomenon presents challenge
for deep CNN on representing atmospheric river.
Table 6: Confusion matrix for atmospheric river classifica-
tion
Label AR Label Non AR
Predict AR 0.93 0.107
Predict Non AR 0.07 0.893
Figure 3: Sample images of atmospheric rivers correctly clas-
sified (true positive) by our deep CNN model. Figure shows
total column water vapor (color map) and land sea boundary
(solid line).
Figure 4: Sample images of atmospheric rivers mis-classified
(false negative) by our deep CNN model. Figure shows to-
tal column water vapor (color map) and land sea boundary
(solid line).
5.3 Classification Results for Weather Fronts
Among the three types of climate events we are looking
at, weather fronts have the most complex spatial pattern.
Weather fronts typically form at the interface of warm air
and cold air, and usually associated with heavy precipitation
due moisture condensation of warm air up-lifting. In satel-
lite images,a weather front is observable as a strip of clouds,
but it is hardly visible on two dimensional fields such as tem-
perature and pressure. In middle latitude (e.g. most U.S.),
a portion of weather front are associated with extra-tropical
cyclones. Figure 5 shows examples of correctly classified
weather front by our deep CNN system. Visually, the nar-
row long regions of high precipitation line up approximately
parallel to the temperature contour. This is a clear charac-
teristics and comparatively easy for deep CNNs to learn.
Because patterns of weather fronts is rather complex and
hardly show up in two dimensional fields, we decided to
further investigate it in later work.
Table 7: Confusion matrix for weather front classification
Label WF Label Non WF
Predict WF 0.876 0.18
Predict Non WF 0.124 0.82
6. FUTURE WORK
In the present study, we trained deep CNNs separately for
classifying tropical cyclones, atmospheric rivers and weather
fronts individually. Ideally, we would like to train a single
neural network for detecting all three types of events. Un-
like object recognition in natural images, climate patterns
detection have unique challenges. Firstly, climate events
happen at vastly different spatial scales. For example, a
tropical cyclone typically extends over less than 500 kilo-
meters in radius, while an atmospheric river can be several
thousand kilometers long. Secondly, different climate events
are characterized by different sets of physical variables. For
Figure 5: Sample images of weather front correctly classified
by our deep CNN model. Figure shows precipitation with
daily precipitation less than 5 millimeters filtered out (color
map), near surface air temperature (solid contour line) and
sea level pressure (dashed contour line)
example, atmospheric rivers correlate strongly with the ver-
tical integration of water vapor, while tropical cyclones has a
more complex multi-variable pattern involving sea level pres-
sure, near surface wind and upper troposphere temperature.
Future work will need to develop generative CNN architec-
tures that are capable of discriminating between different
variables based on the event type and capable of handling
events at various spatial scale. Note that we have primar-
ily addressed detection of extreme weather patterns, but
not their localization. We will consider architectures for
spatially localizing weather pattern in the future.
Several researchers have pointed out that deeper and larger
CNNs perform better for classification and detection tasks[22,
28] compared to shallow networks. However, deep networks
require huge amount of data to be effectively trained, and to
prevent model over fitting. Datasets, such as ImageNet, pro-
vide millions of labeled images for training and testing deep
and large CNNs. In contrast, we can only obtain a small
amount of labeled training data, hence we are constrained
on the class of deep CNNs that we can explore without suf-
fering from over-fitting. This limitation also points us to
the need for developing unsupervised approaches for climate
pattern detection. We believe that this will be critical for the
majority of scientific disciplines that typically lack labeled
data.
7. CONCLUSION
In this study, we explored deep learning as a methodol-
ogy for detecting extreme weather patterns in climate data.
We developed deep CNN architecture for classifying tropical
cyclones, atmospheric rivers and weather fronts. The sys-
tem achieves fairly high classification accuracy, range from
89% to 99%. To the best of our knowledge, this is the first
time that deep CNN has been applied to tackle climate pat-
tern recognition problems. This successful application could
be a precursor for tackling a broad class of pattern detec-
tion problem in climate science. Deep neural network learns
high-level representations from data directly, therefore po-
tentially avoiding traditional subjective thresholding based
criteria of climate variables for event detection. Results
from this study will be used for quantifying climate extreme
events trend in current day and future climate scenarios, as
well as investigating the changes in dynamics and thermody-
namics of extreme events in global warming contend. This
information is critical for climate change adaptation, hazard
risk prediction and climate change policy making.
8. ACKNOWLEDGMENTS
This research was conducted using ”Neon”, an open source
library for deep learning from Nervana Systems.
This research used resources of the National Energy Re-
search Scientific Computing Center, a DOE Office of Sci-
ence User Facility supported by the Office of Science of the
U.S. Department of Energy under Contract No. DE-AC02-
05CH11231. This work was supported by the Director, Of-
fice of Science, Office of Advanced Scientific Computing Re-
search, Applied Mathematics program of the U.S. Depart-
ment of Energy under Contract No. DE-AC02-05CH11231.
References
[1] E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on
bayesian optimization of expensive cost functions, with
application to active user modeling and hierarchical re-
inforcement learning. arXiv preprint arXiv:1012.2599,
2010.
[2] R. Chattopadhyay, A. Vintzileos, and C. Zhang. A de-
scription of the madden–julian oscillation based on a
self-organizing map. Journal of Climate, 26(5):1716–
1732, 2013.
[3] G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-
dependent pre-trained deep neural networks for large-
vocabulary speech recognition. Audio, Speech, and Lan-
guage Processing, IEEE Transactions on, 20(1):30–42,
2012.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich
feature hierarchies for accurate object detection and se-
mantic segmentation. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 580–587, 2014.
[5] X. Glorot and Y. Bengio. Understanding the difficulty
of training deep feedforward neural networks. In Inter-
national conference on artificial intelligence and statis-
tics, pages 249–256, 2010.
[6] J. Gorricha, V. Lobo, and A. C. Costa. A framework
for exploratory analysis of extreme weather events using
geostatistical procedures and 3d self-organizing maps.
International Journal on Advances in Intelligent Sys-
tems, 6(1), 2013.
[7] A. Graves, A.-r. Mohamed, and G. Hinton. Speech
recognition with deep recurrent neural networks. In
Acoustics, Speech and Signal Processing (ICASSP),
2013 IEEE International Conference on, pages 6645–
6649. IEEE, 2013.
[8] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed,
N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N.
Sainath, et al. Deep neural networks for acoustic mod-
eling in speech recognition: The shared views of four
research groups. Signal Processing Magazine, IEEE,
29(6):82–97, 2012.
[9] G. Iglesias, D. C. Kale, and Y. Liu. An examination of
deep learning for extreme climate pattern analysis. In
The 5th International Workshop on Climate Informat-
ics, 2015.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Ima-
genet classification with deep convolutional neural net-
works. In Advances in Neural Information Processing
Systems (NIPS), pages 1097–1105, 2012.
[11] K. E. Kunkel, D. R. Easterling, D. A. Kristovich,
B. Gleason, L. Stoecker, and R. Smith. Meteorologi-
cal causes of the secular variations in observed extreme
precipitation events for the conterminous united states.
Journal of Hydrometeorology, 13(3):1131–1141, 2012.
[12] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin.
Exploring strategies for training deep neural networks.
The Journal of Machine Learning Research, 10:1–40,
2009.
[13] D. A. Lavers, G. Villarini, R. P. Allan, E. F. Wood,
and A. J. Wade. The detection of atmospheric rivers in
atmospheric reanalyses and their links to british winter
floods and the large-scale climatic circulation. Journal
of Geophysical Research: Atmospheres, 117(D20), 2012.
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[15] V. Nair and G. E. Hinton. Rectified linear units im-
prove restricted boltzmann machines. In Proceedings of
the 27th International Conference on Machine Learning
(ICML), pages 807–814, 2010.
[16] D. S. Nolan and M. G. McGauley. Tropical cyclogenesis
in wind shear: Climatological relationships and phys-
ical processes. In Cyclones: Formation, Triggers, and
Control, pages 1–36. Nova Science Publishers, 2012.
[17] Prabhat, S. Byna, V. Vishwanath, E. Dart, M. Wehner,
W. D. Collins, et al. Teca: Petascale pattern recogni-
tion for climate science. In Computer Analysis of Im-
ages and Patterns, pages 426–436. Springer, 2015.
[18] Prabhat, O. R¨
ubel, S. Byna, K. Wu, F. Li, M. Wehner,
W. Bethel, et al. Teca: A parallel toolkit for extreme
climate analysis. In Third Worskhop on Data Mining
in Earth System Science (DMESS) at the International
Conference on Computational Science (ICCS), 2012.
[19] D. Ruhmelhart, G. Hinton, and R. Wiliams. Learn-
ing representations by back-propagation errors. Nature,
323:533–536, 1986.
[20] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fer-
gus, and Y. LeCun. Overfeat: Integrated recognition,
localization and detection using convolutional networks.
In International Conference on Learning Representa-
tions (ICLR), 2014.
[21] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong,
and W.-c. Woo. Convolutional lstm network: A ma-
chine learning approach for precipitation nowcasting.
In Advances in Neural Information Processing Systems:
Twenty-Ninth Annual Conference on Neural Informa-
tion Processing Systems (NIPS), 2015.
[22] K. Simonyan and A. Zisserman. Very deep convolu-
tional networks for large-scale image recognition. In
Internaltional Conference on Learning Representation
(ICLR), 2015.
[23] J. Snoek. Spearmint. https://github.com/HIPS/
Spearmint, 2015.
[24] J. Snoek, H. Larochelle, and R. P. Adams. Practical
bayesian optimization of machine learning algorithms.
In Advances in neural information processing systems,
pages 2951–2959, 2012.
[25] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish,
N. Sundaram, M. Patwary, M. Prabhat, and R. Adams.
Scalable bayesian optimization using deep neural net-
works. In Proceedings of The 32nd International Con-
ference on Machine Learning, pages 2171–2180, 2015.
[26] J. T. Springenberg, A. Dosovitskiy, T. Brox, and
M. Riedmiller. Striving for simplicity: The all convo-
lutional net. In International Conference on Learning
Representation (ICLR), 2015.
[27] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to
sequence learning with neural networks. In Advances
in neural information processing systems, pages 3104–
3112, 2014.
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabi-
novich. Going deeper with convolutions. In Proceedings
of the IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 1–9, 2015.
[29] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W.
Smeulders. Selective search for object recognition. In-
ternational Journal of Computer Vision, 104(2):154–
171, 2013.
[30] F. Vitart, J. Anderson, and W. Stern. Simulation of
interannual variability of tropical storm frequency in
an ensemble of gcm integrations. Journal of Climate,
10(4):745–760, 1997.
[31] F. Vitart, J. Anderson, and W. Stern. Impact of large-
scale circulation on tropical storm frequency, intensity,
and location, simulated by an ensemble of gcm integra-
tions. Journal of Climate, 12(11):3237–3254, 1999.
[32] K. Walsh, M. Fiorino, C. Landsea, and K. McInnes.
Objectively determined resolution-dependent thresh-
old criteria for the detection of tropical cyclones in
climate models and reanalyses. Journal of Climate,
20(10):2307–2314, 2007.
[33] K. Walsh and I. G. Watterson. Tropical cyclone-like
vortices in a limited area model: comparison with ob-
served climatology. Journal of Climate, 10(9):2240–
2259, 1997.
[34] M. Wehner, Prabhat, K. A. Reed, D. Stone, W. D.
Collins, and J. Bacmeister. Resolution dependence of
future tropical cyclone projections of cam5. 1 in the us
clivar hurricane working group idealized configurations.
Journal of Climate, 28(10):3905–3925, 2015.
... The DL algorithms and associated processing codes, scripts for class contribution calculations, our developed algorithm for determining the optimal number of clusters in unsupervised clustering, and scripts for conventional methods such as SOM and EOF used in this study are all publicly available at Zenodo (https://doi.org/10.5281/zenodo.13371084, Mehrdad, 2024a), which is linked to the GitHub repository (https://github.com/Sinamhr/Code_examples_radiation_paper, Mehrdad, 2024b). ...
Article
Full-text available
Heterogeneous radiative forcing in mid-latitudes, such as that exerted by aerosols, has been found to affect the Arctic climate, though the mechanisms remain debated. In this study, we leverage deep learning (DL) techniques to explore the complex response of the Arctic climate system to local radiative forcing over Europe. We conducted sensitivity experiments using the Max Planck Institute Earth System Model (MPI-ESM1.2) coupled with atmosphere–ocean–land-surface components. Large-scale circulation patterns can mediate the impact of the forcing on Arctic climate dynamics. We employed a DL-based clustering approach to classify large-scale atmospheric circulation patterns. To enhance the analysis of how these patterns impact the Arctic climate, the poleward moist static energy transport (PMSET) associated with the atmospheric circulation patterns was incorporated as an additional similarity metric in the clustering process. Furthermore, we developed a novel method to analyze the circulation patterns' contributions to various climatic parameter anomalies. Our findings indicate that the negative radiative forcing over Europe alters existing circulation patterns and their occurrence frequency without introducing new ones. Specifically, our analysis revealed that while the regional radiative forcing alters the occurrence frequencies of the circulation patterns, these changes are not the primary drivers of the forcing's impact on the Arctic parameters. Instead, it is the shifts in the mean spatial characteristics of the atmospheric circulation patterns, induced by the forcing, that predominantly determine the effects on the Arctic climate. Our methodology facilitates the uncovering of complex, nonlinear interactions within the climate system, capturing nuances that are often obscured in broader seasonal anomaly analyses. This approach enables a deeper understanding of the dynamics driving observed climatic anomalies and their links to specific atmospheric circulation patterns.
... In the recent past, NNs and other ML techniques have gained significant popularity in climate science [17]. They have been used to detect extreme climate events through pattern recognition [18,19], to explore the possibility of outperforming Numerical Weather Prediction (NWP) [20][21][22], for statistical modelling and forecasting and many other applications in extreme events [17,[23][24][25], and, in general, for the hybridization of ML methods with Traditional Numerical Modelling [26]. For a comprehensive review of neural network applications in climate studies, see [27] or [28], and for an overview of deep learning applications in Earth system science, refer to [14]. ...
Article
Full-text available
Predicting local precipitation patterns over the European Alps remains an open challenge due to many limitations. The complex orography of mountainous areas modulates climate signals, and in order to analyse extremes accurately, it is essential to account for convection, requiring high-resolution climate models’ outputs. In this work, we analyse local seasonal precipitation in Trento (Laste) and Passo Tonale using high-resolution climate data and neural network downscaling. Then, we adopt an ensemble and generalized leave-one-out cross-validation procedure, which is particularly useful for the analysis of small datasets. The application of the procedure allows us to correct the model’s bias, particularly evident in Passo Tonale. This way, we will be more confident in achieving more reliable results for future projections. The analysis proceeds, considering the mean and the extreme seasonal anomalies between the projections and the reconstructions. Therefore, while a decrease in the mean summer precipitation is found in both stations, a neutral to positive variation is expected for the extremes. Such results differ from model’s, which found a clear decrease in both stations in the summer’s mean precipitation and extremes. Moreover, we find two statistically significant results for the extremes: a decrease in winter in Trento and an increase in spring in Passo Tonale.
... Later, they are used when there is spatial dependence between the data sampled for the prediction [21,30,59]. Machine learning interpolators, such as random forest (RF), use variables in nonparametric tree ensemble procedure for data analysis and are considered a very flexible and powerful decision tree-based ensemble classifier [21,45,58]. The RF methodology forms trained random trees that detect weather patterns from labeled data. ...
Article
Full-text available
The knowledge of heavy rainfall is essential for watershed management and hydraulic structure design. Heavy rainfall is characterized by an equation derived from sub-daily rainfall series that relates the rainfall’s intensity, duration, and frequency (IDF equation). This paper proposes a geoprocessing model to obtain the parameters for the IDF equations for places without rainfall data. IDF equations, from the literature, were used to obtain maximum rainfall intensity (im) for 96 combinations of durations and return periods. These ims were spatially interpolated, and the IDF parameters were established for each pixel through non-linear multiple regression. The performance of three different interpolators (inverse distance weighting, Kriging, and random forest) was evaluated. The analysis showed that spatialization by inverse distance weighting had the best performance to establish IDF Eqs. (3% of mean absolute percentage error), followed by random forest (8%) and Kriging (16%).
... Việc ứng dụng trí tuệ nhân tạo (Artificial Intelligence-AI) trong bài toán giám sát vị trí, cường độ XTNĐ từ quan trắc vệ tinh, radar được xem là những thế mạnh thực sự của công nghệ AI thông qua ứng dụng công nghệ nhận dạng với các mẫu tìm kiếm là các mẫu dạng mây bão khác nhau [3]. Ngoài dữ liệu vệ tinh thuần túy, các dữ liệu gián tiếp từ vệ tinh gồm thông tin trường gió khí quyển AMV (Atmospheric Motion Vector -xác định từ dữ liệu vệ tinh đo đạc liên tục theo thời gian), thông tin trạng thái biển từ các dữ liệu vệ tinh cực cũng đóng góp trong việc cải thiện chất lượng giám sát XTNĐ bằng AI [4][5][6]. Một số công trình liên quan đến xác định cường độ XTNĐ như sử dụng kiến trúc mạng nơ-ron tích chập (Convolution Neural Network -CNN) và dữ liệu ảnh vệ tinh kênh phổ hồng ngoại (IR) để xác định cường độ của XTNĐ [7]. Kết quả của nghiên cứu cho sai số trung bình quân phương (RMSE) xấp xỉ ~ 5m/s. ...
Article
Blocking events are an important cause of extreme weather, especially long‐lasting blocking events that trap weather systems in place. The duration of blocking events is, however, underestimated in climate models. Explainable Artificial Intelligence are a class of data analysis methods that can help identify physical causes of prolonged blocking events and diagnose model deficiencies. We demonstrate this approach on an idealized quasigeostrophic (QG) model developed by Marshall and Molteni (1993), https://doi.org/10.1175/1520‐0469(1993)050<1792:taduop>2.0.co;2 . We train a convolutional neural network (CNN), and subsequently, build a sparse predictive model for the persistence of Atlantic blocking, conditioned on an initial high‐pressure anomaly. Shapley Additive ExPlanation (SHAP) analysis reveals that high‐pressure anomalies in the American Southeast and North Atlantic, separated by a trough over Atlantic Canada, contribute significantly to prediction of sustained blocking events in the Atlantic region. This agrees with previous work that identified precursors in the same regions via wave train analysis. When we apply the same CNN to blockings in the ERA5 atmospheric reanalysis, there is insufficient data to accurately predict persistent blocks. We partially overcome this limitation by pre‐training the CNN on the plentiful data of the Marshall‐Molteni model, and then using Transfer learning (TL) to achieve better predictions than direct training. SHAP analysis before and after TL allows a comparison between the predictive features in the reanalysis and the QG model, quantifying dynamical biases in the idealized model. This work demonstrates the potential for machine learning methods to extract meaningful precursors of extreme weather events and achieve better prediction using limited observational data.
Article
This study first utilizes four well-performing pre-trained convolutional neural networks (CNNs) to gauge the intensity of tropical cyclones (TCs) using geostationary satellite infrared (IR) imagery. The models are trained and tested on TC cases spanning from 2004 to 2022 over the western North Pacific Ocean. To enhance the models performance, various techniques are employed, including fine-tuning the original CNN models, introducing rotation augmentation to the initial dataset, temporal enhancement via sequential imagery, integrating auxiliary physical information, and adjusting hyperparameters. An optimized CNN model, i.e., visual geometry group network (VGGNet), for TC intensity estimation is ultimately obtained. When applied to the test data, the model achieves a relatively low mean absolute error (MAE) of 4.05 m s−1. To improve the interpretability of the model, the SmoothGrad combined with the Integrated Gradients approach is employed. The analyses reveal that the VGGNet model places significant emphasis on the distinct inner core region of a TC when estimating its intensity. Additionally, it partly takes into account the configuration of cloud systems as input features for the model, aligning well with meteorological principles. The several improvements made to this model’s performance offer valuable insights for enhancing TC intensity forecasts through deep learning.
Article
Full-text available
Global climate models (GCMs) and Earth system models (ESMs) exhibit biases, with resolutions too coarse to capture local variability for fine-scale, reliable drought and climate impact assessment. However, conventional bias correction approaches may cause implausible climate change signals due to unrealistic representations of spatial and intervariable dependences. While purely data-driven deep learning has achieved significant progress in improving climate and earth system simulations and predictions, they cannot reliably learn the circumstances (e.g., extremes) that are largely unseen in historical climate but likely becoming more frequent in the future climate (i.e., climate non-stationarity). This study shows an integrated trend-preserving deep learning approach that can address the spatial and intervariable dependences and climate non-stationarity issues for downscaling and bias correcting GCMs/ESMs. Here we combine the super-resolution deep residual network (SRDRN) with the trend-preserving quantile delta mapping (QDM) to downscale and bias correct six primary climate variables at once (including daily precipitation, maximum temperature, minimum temperature, relative humidity, solar radiation, and wind speed) from five state-of-the-art GCMs/ESMs in the Coupled Model Intercomparison Project Phase 6 (CMIP6). We found that the SRDRN-QDM approach greatly reduced GCMs/ESMs biases in spatial and intervariable dependences while significantly better-reducing biases in extremes compared to deep learning. The estimated drought based on the six bias-corrected and downscaled variables captured the observed drought intensity and frequency, which outperformed state-of-the-art multivariate bias correction approaches, demonstrating its capability for correcting GCMs/ESMs biases in spatial and multivariable dependences and extremes.
Article
The paper explores how AI-enabled utilizing data analytics and machine learning methodologies enables deeper insights into the intricate patterns and behaviors of climate dynamics by analysing amounts of various data, integrating information from various origins, like satellite imagery, and the sensory data is processed to reveal meaningful insights for better understanding and informed actions. These can inform any policy decisions and facilitate more targeted interventions to mitigate the impacts of the climate conditions. The work discussed here in this research provided sources focuses on leveraging artificial intelligence (AI) and machine learning (ML) to address climate change challenges. Studies emphasize AI-driven strategies for climate change adaptation and including predicting various changes in the environment, and changes in the weather patterns. The research highlights the importance of weather conditions, and change in the weather patterns, and in developing effective AI-powered climate change in the adaptation strategies. And accordingly, these studies shows how effectively different AI and ML models like LSTM, ANN, CNN in improving the climate predictions and understanding the weather. AI and ML technologies in enhancing the changes in the weather, mitigation.
Conference Paper
Full-text available
Climate Change is one of the most pressing challenges facing humanity in the 21st century. Climate simulations provide us with a unique opportunity to examine effects of anthropogenic emissions. High-resolution climate simulations produce " Big Data " : contemporary climate archives are ≈ 5P B in size and we expect future archives to measure on the order of Exa-Bytes. In this work, we present the successful application of TECA (Toolkit for Extreme Climate Analysis) framework, for extracting extreme weather patterns such as Tropical Cyclones, Atmospheric Rivers and Extra-Tropical Cyclones from TB-sized simulation datasets. TECA has been run at full-scale on Cray XE6 and IBM BG/Q systems, and has reduced the runtime for pattern detection tasks from years to hours. TECA has been utilized to evaluate the performance of various computational models in reproducing the statistics of extreme weather events, and for characterizing the change in frequency of storm systems in the future.
Article
Full-text available
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting.
Article
Full-text available
Bayesian optimization has been demonstrated as an effective methodology for the global optimization of functions with expensive evaluations. Its strategy relies on querying a distribution over functions defined by a relatively cheap surrogate model. The ability to accurately model this distribution over functions is critical to the effectiveness of Bayesian optimization, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires a large number of evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to Gaussian processes to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we use to rapidly search over large spaces of models. We achieve state-of-the-art results on benchmark object recognition tasks using convolutional neural networks, and image caption generation using multimodal neural language models.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates backslashemphdeep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
The formation of tropical cyclones remains a topic of great interest in the field oftropical meteorology. A number of influential studies have considered the process oftropical cyclone formation (also known as TC genesis) from a pre-existing, weak tropicaldisturbance in a quiescent atmosphere from theoretical perspectives and using numericalsimulations. However, it is shown that the large majority of TC genesis events occurunder the influence of significant vertical wind shear. The effects of wind shear on TCgenesis is explored from both a climatological perspective and from the statistics of windshear in environments around individual TC genesis events. While earlier studiessuggested that moderate wind shear values, in the range of 5 to 10 ms-1, were the mostfavorable states for genesis, it is shown that small values of wind shear in the range of1.25 to 5 ms-1are the most favorable, and very little shear (less than 1.25 ms-1) is notunfavorable. Statistically, easterly shear appears to be more favorable than westerlyshear.The physical process of TC genesis in wind shear is explored with high-resolutionnumerical simulations using a mesoscale model in an idealized framework. Thetransformation of a weak, mid-level vortex into a warm-cored tropical cyclone issimulated in environments with no flow, with mean flow and no wind shear, and withmean flow and wind shear. The simulations show that in terms of the formation of aclosed, low-level circulation, moderate wind shear is indeed more conducive to genesis,but is also prohibitive to further development. However, in contrast to the statisticalfindings and some previous results, westerly shear is found to be significantly morefavorable for TC genesis than easterly shear. The reasons for the greater favorableness of wind shear versus no wind shear, and of westerly shear versus easterly shear, arediscussed within the context of the numerical simulations. Further statistical analysissuggests that the greater favorableness for easterly shear in the real atmosphere may bedue to a correlation between easterly shear and more favorable thermodynamicconditions.