Content uploaded by Mr Prabhat
Author content
All content in this area was uploaded by Mr Prabhat on Jul 07, 2016
Content may be subject to copyright.
Application of Deep Convolutional Neural Networks for
Detecting Extreme Weather in Climate Datasets
Yunjie Liu
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
yunjieliu@lbl.gov
Evan Racah
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
eracah@lbl.gov
Prabhat
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
prabhat@lbl.gov
Joaquin Correa
National Energy Research
Scientific Computing Center
Lawrence Berkeley Lab
Berkeley, CA
joaquincorrea@lbl.gov
Amir Khosrowshahi
Nervana Systems
San Diego, CA
amir@nervanasys.com
David Lavers
Scripps Institution of
Oceanography
San Diego, CA
dlavers@ucsd.edu
Kenneth Kunkel
National Oceanic and
Atmospheric Administration
Asheville, NC
ken.kunkel@noaa.gov
Michael Wehner
Lawrence Berkeley Lab
Berkeley, CA
mfwehner@lbl.gov
William Collins
Lawrence Berkeley Lab
Berkeley, CA
wdcollins@lbl.gov
ABSTRACT
Detecting extreme events in large datasets is a major chal-
lenge in climate science research. Current algorithms for
extreme event detection are build upon human expertise in
defining events based on subjective thresholds of relevant
physical variables. Often, multiple competing methods pro-
duce vastly different results on the same dataset. Accurate
characterization of extreme events in climate simulations
and observational data archives is critical for understand-
ing the trends and potential impacts of such events in a
climate change content. This study presents the first appli-
cation of Deep Learning techniques as alternative method-
ology for climate extreme events detection. Deep neural
networks are able to learn high-level representations of a
broad class of patterns from labeled data. In this work, we
developed deep Convolutional Neural Network (CNN) clas-
sification system and demonstrated the usefulness of Deep
Learning technique for tackling climate pattern detection
problems. Coupled with Bayesian based hyper-parameter
optimization scheme, our deep CNN system achieves 89%-
99% of accuracy in detecting extreme events (Tropical Cy-
clones, Atmospheric Rivers and Weather Fronts).
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
KDD 2016 August 13-17, San Francisco, CA, USA
c
2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-2138-9. . . $15.00
DOI: 10.475/123 4
Keywords
Pattern Recognition, Deep Learning; Convolutional Neural
Network; Climate Analytics; Extreme Events
1. INTRODUCTION
Extreme climate events (such as hurricanes and heat waves)
pose great potential risk on infrastructure and human health.
Hurricane Joaquin, for example, hit Carolina in early Octo-
ber 2015, and dropped over 2 feet of precipitation in days,
resulted in severe flooding and economic loss. An important
scientific goal in climate science research is to characterize
extreme events in current day and future climate projec-
tions. However, understanding the developing mechanism
and life cycle of these events as well as future trend requires
accurately identifying such pattern in space and time. Satel-
lites acquire 10s of TBs of global data every year to provide
us with insights into the evolution of the climate system. In
addition, high resolution climate models produces 100s of
TBs of data from multi-decadal run to enable us to explore
future climate scenarios under global warming. Detecting
extreme climate events in terabytes of data presents an un-
precedented challenge for climate science.
Existing extreme climate events (e.g. hurricane) detec-
tion methods all build upon human expertise in defining
relevant events based on evaluating of relevant spatial and
temporal variables on hard and subjective thresholds. For
instance, tropical cyclones are strong rotating weather sys-
tems that are characterized by low pressure, warm temper-
ature core structures with high wind. However, there is no
universally accepted sets of criteria for what defines a trop-
ical cyclone [16]. The ”Low” Pressure and ”Warm” Temper-
ature are interpreted differently among climate scientists,
therefore different thresholds are used to characterize them.
Researchers [30, 31, 33, 32, 18, 17] have developed vari-
arXiv:1605.01156v1 [cs.CV] 4 May 2016
ous algorithms to detect tropical cyclones in large climate
dataset based on subjective thresholding of several relevant
variables (e.g. sea level pressure, temperature, wind etc.).
One of the general and promising extreme climate event
detecting software, Toolkit for Extreme Climate Analysis
(TECA) [18, 17], is able to detect tropical cyclones, extra-
tropical cyclones and atmospheric rivers. TECA utilizes the
MapReduce paradigm to find pattern in Terabytes of cli-
mate data with in hours. However, many climate extreme
events do not have a clear empirical definition that is ac-
cepted universally by climate scientists (e.g. extra-tropical
cyclone and mesoscale convective system), which precludes
the development and application of algorithms for detection
and tracking. This study attempts to search for an alterna-
tive methodology for extreme events detection by designing
a neural network based system that is capable of learning a
broad class of patterns from complex multi-variable climate
data and avoiding subjective threshold.
Recent advances in deep learning have demonstrated ex-
citing and promising results on pattern recognition tasks,
such as ImageNet Large Scale Visual Recognition Challenge
[10, 22, 28] and speech recognition [8, 3, 7, 27]. Many of
the state-of-art deep learning architectures for visual pat-
tern recognition are based on the hierarchical feature learn-
ing convolutional neural network (CNN). Modern CNN sys-
tems tend to be deep and large with many hidden layers
and millions of neurons, making them flexible in learning a
broad class of patterns simultaneously from data. AlexNet
(7 layers with 5 convolutonal layer and 2 fully connected
layer) developed by [10] provides the first end to end train-
able deep learning system on objective classification, which
achieved 15.3% top-5 classification error rate on ILSVRC-
2012 data set. On the contrary, previous best performed
non-neural network based systems achieved only 25.7% top-
5 classification error on the same data set. Shortly after that,
Simonyan and Zisserman [22] further developed AlexNet and
introduced an even deeper CNN (19 layers with 16 convolu-
tional layer and 3 fully connected layer) with smaller kernel
(filter) and achieved an impressively 6.8% top-5 classifica-
tion error rate on ILSVRC-2014 data set. Szegedy et al.[28]
introduced the “inception” neural network concept (network
includes sub-network) and developed an even deeper CNN
(22 layers) that achieved comparable classification results on
ImageNet benchmark. Build on deep CNN, Sermanet et al.
[20] introduced an integrated system of classification and de-
tection, in which features learned by convolutional layers are
shared among classification and localization tasks and both
tasks are performed simultaneously in a single network. Gir-
shick et al. [4] took a completely different approach by com-
bining a region proposal framework [29] with deep CNN and
designed the state of art R-CNN object detection system.
In this paper, we formulate the problem of detecting ex-
treme climate events as classic visual pattern recognition
problem. We then build end to end trainable deep CNN
systems, following the architecture introduced by [10]. The
model was trained to classify tropical cyclone, weather front
and atmospheric river. Unlike the ImageNet challenge, where
the training data are labeled natural images, our training
data consist of several continuous spatial variables(e.g. pres-
sure, temperature, precipitation) and are stacked together
into image patches.
2. RELATED WORK
Climate data analysis requires an array of advanced method-
ology. Neural network based machine learning approach, as
a generative analysis technique, has received much attention
and been applied to tackle several climate problems in re-
cent year. Chattopadhyay et al. [2] developed a nonlinear
clustering method based on Self Organizational Map (SOM)
to study the structure evolution of Madden–Julian oscilla-
tion (MJO). Their method does not require selecting leading
modes or intraseasonal bandpass filtering in time and space
like other methods do. The results show SOM based method
is not only able to capture the gross feature in MJO struc-
ture and development but also reveals insights that other
methods are not able to discover such as the dipole and
tripole structure of outgoing long wave radiation and dia-
batic heating in MJO. Gorricha and Costa [6] used a three
dimensional Self Organizational Map on categorizing and
visualizing extreme precipitation patterns over an island in
Spain. They found spatial precipitation patterns that tradi-
tional precipitation index approach is not able to discover,
and concluded that three dimensional Self Organizational
Map is very useful tool on exploratory spatial pattern anal-
ysis. More recently, Shi et al. [21] implemented a newly de-
veloped convolutional long short term memory (LSTM) deep
neural network for precipitation nowcasting. Trained on two
dimensional radar map time series, their system is able to
outperform the current state-of-art precipitation nowcast-
ing system on various evaluation metrics. Iglesias et al. [9]
developed a multitask deep fully connected neural network
on prediction heat waves trained on historical time series
data. They demonstrate that neural network approach is
significantly better than linear and logistic regression. And
potentially can improve the performance of forecasting ex-
treme heat waves. These studies show that neural network is
a generative method and can be applied on various climate
problems. In this study, we explore deep Convolutional Neu-
ral Network on solving climate pattern detection problem.
3. METHODS
3.1 Convolutional Neural Network
A Deep CNN is typically comprised of several convolu-
tional layers followed by a small amount of fully connected
layers. In between two successive convolutional layers, sub-
sampling operation (e.g. max pooling, mean pooling) is per-
formed typically. Researchers have argued about the neces-
sity of pooling layers, and argue that they can be simply
replaced by convolutional layer with increased strides, thus
simplify the network structure [26]. In either case, the inputs
of a CNN is (m,n,p) images, where mand nis the width
and height of an image in pixel, pis the number of color
channel of each pixel. The output of a CNN is a vector of q
probability units (class scores), corresponding to the num-
ber of categories to be classified (e.g. for binary classifier
q=2).
The convolutional layers perform convolution operation
between kernels and the input images (or feature maps from
previous layer). Typically, a convolutional layer contains k
filters (kernels) with the size (i,j,p). Where i, j is the width
and height of the filter. The filters are usually smaller than
the width mand height nof input image. palways equal
to the number of color channel of input image (e.g. a color
image has three channels: red, green, and blue). Each of
the filters is independently convolved with the input images
(or feature maps from previous layer) followed by non-linear
transformation and generates kfeature maps, which serve
as inputs for the next layer. In the process of convolution,
a dot product is computed between the entry of filter and
the local region that it is connected to in the input image
(or feature map from previous layer). The parameters of
convolutional layer are these learnable filters. The convo-
lutional layer is the feature extractor, because the kernels
slide across all the inputs and will produce larger outputs
for certain sub-regions than for others. This allows features
to be extracted from inputs and preserved in the feature
maps, which are passed on to next layer, regardless of where
the feature is located in the input. The pooling layer sub-
samples the feature maps generated from convolutional layer
over a (s,t) contiguous region, where s, t is the width and
height of the subsampling window. This results in the reso-
lution of the feature maps becoming coarser with the depth
of CNN. All feature maps are high-level representations of
the input data in CNN. The fully connected layer has con-
nections to all hidden units in previous layer. If it is the last
layer within CNN architecture, the fully connected layer also
does the high level reasoning based on the feature vectors
from previous layer and produce final class scores for image
objects.
Most of current deep neural network uses back propa-
gation as learning rule [19]. The back propagation algo-
rithm searches for minimum of loss function in weight space
through gradient descent method.It partitions the final total
loss to each of the single neuron in the network and repeat-
edly adjusts the weights of neurons whose loss is high, and
back propagate the error through the entire network from
output to its inputs.
3.2 Hyper-parameter Optimization
Training deep neural network is known to be hard [12, 5].
Effectively and efficiently train deep neural network not only
requires large amount of training data, but also requires
carefully tuning model hyper-parameters (e.g. learning pa-
rameters, regularization parameters) [24]. The parameter
tuning process, however, can be tedious and non-intuitive.
Hyper-parameter optimization can be reduced to find a set
of parameters for a network that produces the best possi-
ble validation performance. As such, this process can be
thought of as a typical optimization problem of finding a
set, x, of parameter values from a bounded set Xthat mini-
mize an objective function f(x), where xis a particular set-
ting of the hyper-parameters and f(x) is the loss for a deep
neural network with a particular set of training and testing
data as function of the hyper-parameter inputs. Training
a deep neural network is not only a costly (with respect to
time) procedure, but a rather opaque process with respect
to how the network performance varies with respect to its
hyper-parameter inputs. Because training and validating
a deep neural network is very complicated and expensive,
Bayesian Optimization (which assumes f(x) is not known,
is non-convex and is expensive to evaluate) is a well-suited
algorithm for hyper-parameter optimization for our task at
hand. Bayesian Optimization attempts to optimize f(x)
by constructing two things: a probabilistic model of f(x)
and an acquistion function that picks which point xin X
to evaluate next. The probabilistic model is updated with
Baye’s rule with a Gaussian prior. The acquisition function
suggests hyper-parameter settings or points to evaluate by
trying to balance evaluating parameter settings in regions,
where f(x) is low and points in regions where the uncer-
tainty in the probabilistic model is high. As a result the
optimization procedure attempts to evaluate as few points
as possible [1] [24].
In order to implement Bayesian Optimization, we use a
tool called Spearmint. Spearmint works by launching a
Spearmint master process, which creates a database for col-
lecting all model evaluation results. The master process then
spawns many processes, which execute training and evalua-
tion with respect to a set of hyper-parameters proposed by
the acquisition function and then report their results to the
database. From there, the master process uses the results
in the database to propose further parameter settings and
launch additional processes.
3.3 CNN Configuration
Following AlexNet [10], we developed a deep CNN which
has totally 4 learnable layers, including 2 convolutional lay-
ers and 2 fully connected layers. Each convolutional layer
is followed by a max pooling layer. The model is con-
structed based on the open source python deep learning
library NOEN. The configuration of our best performed ar-
chitectures are shown in Table 1.
The networks are shallower and smaller comparing to the
state-of-art architecture developed by [22, 28].The major
limitations for exploring deeper and larger CNNs is the lim-
ited amount of labeled training data that we can obtain.
However, a small network has the advantage of avoiding
over-fitting, especially when the amount of training data is
small. We also chose comparatively large kernels (filters) in
the convolutional layer based on input data size, even though
[22] suggests that deep architecture with small kernel (filter)
is essential for state of art performance. This is because cli-
mate patterns are comparatively simpler and larger in size
as compared to objects in ImageNet dataset.
One key feature of deep learning architectures is that it
is able to learn complex non-linear functions. The convolu-
tional layers and first fully connected layer in our deep CNNs
all have Rectified Linear Unit (ReLU) activation functions
[15] as characteristic. ReLU is chosen due to its faster learn-
ing/training character [10] as compared to other activation
functions like tanh.
f(x) = max(0, x) (1)
Final fully connected layer has Logistic activation function
as non-linearity, which also serves as classifier and outputs
a probability distribution over class labels.
f(x) = 1
1 + e−x(2)
3.4 Computational Platform
We performed our data processing, model training and
testing on Edison, a Cray XC30 and Cori, a Cray XC40
supercomputing systems at the National Energy Research
Scientific Computing Center (NERSC). Each of Edison com-
puting node has 24 2.4 GHz Intel Xeon processors. Each of
Cori computing node has 32 2.3 GHz Intel Haswell proces-
sors. In our work, we mainly used single node CPU backend
of NEON. The hyper-parameter optimization was performed
on a single node on Cori with tasks fully parallel on 32 cores.
Table 1: Deep CNN architecture and layer parameters. The convolutional layer param-
eters are denoted as <filter size>-<number of feature maps>(e.g. 5x5-8). The pooling
layer parameters are denoted as <pooling window>(e.g. 2x2). The fully connected
layer parameter are denoted as <number of units>(e.g. 2).
Conv1 Pooling Conv2 Pooling Fully Fully
Tropical Cyclone 5x5-8 2x2 5x5-16 2x2 50 2
Weather Fronts 5x5-8 2x2 5x5-16 2x2 50 2
Atmospheric River 12x12-8 3x3 12x12-16 2x2 200 2
Table 2: Data Sources
Climate Dataset Time Frame Temporal Resolution Spatial Resolution
(lat x lon degree)
CAM5.1 historical run 1979-2005 3 hourly 0.23x0.31
ERA-Interim reanalysis 1979-2011 3 hourly 0.25x0.25
20 century reanalysis 1908-1948 Daily 1x1
NCEP-NCAR reanalysis 1949-2009 Daily 1x1
Table 3: Size of image patch, diagnostic variables and number of labeled dataset used
for extreme event considered in the study
Events Image Dimension Variables Total Examples
Tropical Cyclone 32x32 PSL,VBOT,UBOT,
T200,T500,TMQ,
V850,U850
10,000 +ve 10,000 -ve
Atmospheric River 148 x 224 TMQ,Land Sea Mask 6,500 +ve 6,800 -ve
Weather Front 27 x 60 2m Temp, Precip,
SLP
5,600 +ve 6,500 -ve
4. DATA
In this study, we use both climate simulations and re-
analysis products. The reanalysis products are produced by
assimilating observations into a climate model. A summary
of the data source and its temporal and spatial resolution is
listed in Table 2. Ground truth labeling of various events
is obtained via multivariate threshold based criteria imple-
mented in TECA [18, 17], and manual labeling by experts
[11, 13]. Training data comprise of image patterns, where
several relevant spatial variables are stacked together over
a prescribed region that bounds a type of event. The di-
mension of the bounding box is based domain knowledge of
events spatial extent in real word. For instance, tropical cy-
clone radius are typically with in range of 100 kilometers to
500 kilometers, thus bounding box size of 500 kilometers by
500 kilometers is likely to capture most of tropical cyclones.
The chosen physical variables are also based on domain ex-
pertise. The prescribed bounding box is placed over the
event. Relevant variables are extracted within the bound-
ing box and stacked together. To facilitate model train-
ing, bounding box location is adjusted slightly such that all
of events are located approximately at the center. Image
patches are cropped and centered correspondingly. Because
of the spatial dimension of climate events vary quite a lot
and the spatial resolution of source data is non-uniform, fi-
nal training images prepared differ in their size among the
three types of event. A summary of the attributes of training
images is listed in Table 3.
5. RESULTS AND DISCUSSION
Table 4 summarizes the performance of our deep CNN ar-
chitecture on classifying tropical cyclones, atmospheric rivers
and weather fronts. We obtained fairly high accuracy (89%-
99%) on extreme event classification. In addition, the sys-
tems do not suffer from over-fitting. We believe this is
mostly because of the shallow and small size of the architec-
ture (4 learnable layers) and the weight decay regularization.
Deeper and larger architecture would be inappropriate for
this study due to the limited amount of training data. Fairly
good train and test classification results also suggest that the
deep CNNs we developed are able to efficiently learn repre-
sentations of climate pattern from labeled data and make
predictions based on feature learned. Traditional threshold
based detection method requires human expert carefully ex-
amine the extreme event and its environment, thus come
up with thresholds for defining the events. In contrast, as
shown in this study, deep CNNs are able to learn climate
pattern just from the labeled data, thus avoiding subjective
thresholds.
Table 4: Overall Classification Accuracy
Event Type Train Test Train
time
Tropical Cyclone 99% 99% ≈30 min
Atmospheric River 90.5% 90% 6-7 hour
Weather Front 88.7% 89.4% ≈30 min
5.1 Classification Results for Tropical Cyclones
Tropical cyclones are rapid rotating weather systems that
are characterized by low pressure center with strong wind
circulating the center and warm temperature core in upper
troposphere. Figure 1 shows examples of tropical cyclones
simulated in climate models, that are correctly classified by
deep CNN (warm core structure is not shown in this figure).
Tropical cyclone features are rather well defined, as can be
seen from the distinct low pressure center and spiral flow
of wind vectors around the center. These clear and distinct
characteristics make tropical cyclone pattern relatively easy
to learn and represent within CNN. Our deep CNNs achieved
nearly perfect (99%) classification accuracy.
Figure 2 shows examples of tropical cyclones that are mis-
classified. After carefully examining these events, we believe
they are weak systems (e.g. tropical depression), whose low
pressure center and spiral structure of wind have not fully
developed. The pressure distribution shows a large low pres-
sure area without a clear minimum. Therefore, our deep
CNN does not label them as strong tropical cyclones.
Table 5: Confusion matrix for tropical cyclone classification
Label TC Label Non TC
Predict TC 0.989 0.003
Predict Non TC 0.011 0.997
Figure 1: Sample images of tropical cyclones correctly clas-
sified (true positive) by our deep CNN model. Figure shows
sea level pressure (color map) and near surface wind distri-
bution (vector solid line).
Figure 2: Sample images of tropical cyclones mis-classified
(false negative) by our deep CNN model. Figure shows sea
level pressure (color map) and near surface wind distribution
(vector solid line).
5.2 Classification Results for Atmospheric Rivers
In contrast to tropical cyclones, atmospheric rivers are dis-
tinctively different events. They are narrow corridors of con-
centrated moisture in atmosphere. They usually originate
in tropical oceans and move pole-ward. Figure 3 shows ex-
amples of correctly classified land falling atmospheric rivers
that occur on the western Pacific Ocean and north Atlantic
Ocean. The characteristics of narrow water vapor corridor
is well defined and clearly observable in these images.
Figure 4 are examples of mis-classified atmospheric rivers.
Upon further investigation, we believe there are two main
factors leading to mis-classification. Firstly, presence of
weak atmospheric river systems. For instance, the left col-
umn of Figure 4 shows comparatively weak atmospheric
rivers. The water vapor distribution clearly show a band of
concentrated moisture cross mid-latitude ocean, but the sig-
nal is much weaker comparing to Figure 3. Thus, deep CNN
does not predict them correctly. Secondly, the presence of
other climate event may also affect deep CNN representa-
tion of atmospheric rivers. In reality, the location and shape
of atmospheric river are affected by jet streams and extra-
tropical cyclones. For example, Figure 4 right column shows
rotating systems (likely extra-tropical cyclone) adjacent to
the atmospheric river. This phenomenon presents challenge
for deep CNN on representing atmospheric river.
Table 6: Confusion matrix for atmospheric river classifica-
tion
Label AR Label Non AR
Predict AR 0.93 0.107
Predict Non AR 0.07 0.893
Figure 3: Sample images of atmospheric rivers correctly clas-
sified (true positive) by our deep CNN model. Figure shows
total column water vapor (color map) and land sea boundary
(solid line).
Figure 4: Sample images of atmospheric rivers mis-classified
(false negative) by our deep CNN model. Figure shows to-
tal column water vapor (color map) and land sea boundary
(solid line).
5.3 Classification Results for Weather Fronts
Among the three types of climate events we are looking
at, weather fronts have the most complex spatial pattern.
Weather fronts typically form at the interface of warm air
and cold air, and usually associated with heavy precipitation
due moisture condensation of warm air up-lifting. In satel-
lite images,a weather front is observable as a strip of clouds,
but it is hardly visible on two dimensional fields such as tem-
perature and pressure. In middle latitude (e.g. most U.S.),
a portion of weather front are associated with extra-tropical
cyclones. Figure 5 shows examples of correctly classified
weather front by our deep CNN system. Visually, the nar-
row long regions of high precipitation line up approximately
parallel to the temperature contour. This is a clear charac-
teristics and comparatively easy for deep CNNs to learn.
Because patterns of weather fronts is rather complex and
hardly show up in two dimensional fields, we decided to
further investigate it in later work.
Table 7: Confusion matrix for weather front classification
Label WF Label Non WF
Predict WF 0.876 0.18
Predict Non WF 0.124 0.82
6. FUTURE WORK
In the present study, we trained deep CNNs separately for
classifying tropical cyclones, atmospheric rivers and weather
fronts individually. Ideally, we would like to train a single
neural network for detecting all three types of events. Un-
like object recognition in natural images, climate patterns
detection have unique challenges. Firstly, climate events
happen at vastly different spatial scales. For example, a
tropical cyclone typically extends over less than 500 kilo-
meters in radius, while an atmospheric river can be several
thousand kilometers long. Secondly, different climate events
are characterized by different sets of physical variables. For
Figure 5: Sample images of weather front correctly classified
by our deep CNN model. Figure shows precipitation with
daily precipitation less than 5 millimeters filtered out (color
map), near surface air temperature (solid contour line) and
sea level pressure (dashed contour line)
example, atmospheric rivers correlate strongly with the ver-
tical integration of water vapor, while tropical cyclones has a
more complex multi-variable pattern involving sea level pres-
sure, near surface wind and upper troposphere temperature.
Future work will need to develop generative CNN architec-
tures that are capable of discriminating between different
variables based on the event type and capable of handling
events at various spatial scale. Note that we have primar-
ily addressed detection of extreme weather patterns, but
not their localization. We will consider architectures for
spatially localizing weather pattern in the future.
Several researchers have pointed out that deeper and larger
CNNs perform better for classification and detection tasks[22,
28] compared to shallow networks. However, deep networks
require huge amount of data to be effectively trained, and to
prevent model over fitting. Datasets, such as ImageNet, pro-
vide millions of labeled images for training and testing deep
and large CNNs. In contrast, we can only obtain a small
amount of labeled training data, hence we are constrained
on the class of deep CNNs that we can explore without suf-
fering from over-fitting. This limitation also points us to
the need for developing unsupervised approaches for climate
pattern detection. We believe that this will be critical for the
majority of scientific disciplines that typically lack labeled
data.
7. CONCLUSION
In this study, we explored deep learning as a methodol-
ogy for detecting extreme weather patterns in climate data.
We developed deep CNN architecture for classifying tropical
cyclones, atmospheric rivers and weather fronts. The sys-
tem achieves fairly high classification accuracy, range from
89% to 99%. To the best of our knowledge, this is the first
time that deep CNN has been applied to tackle climate pat-
tern recognition problems. This successful application could
be a precursor for tackling a broad class of pattern detec-
tion problem in climate science. Deep neural network learns
high-level representations from data directly, therefore po-
tentially avoiding traditional subjective thresholding based
criteria of climate variables for event detection. Results
from this study will be used for quantifying climate extreme
events trend in current day and future climate scenarios, as
well as investigating the changes in dynamics and thermody-
namics of extreme events in global warming contend. This
information is critical for climate change adaptation, hazard
risk prediction and climate change policy making.
8. ACKNOWLEDGMENTS
This research was conducted using ”Neon”, an open source
library for deep learning from Nervana Systems.
This research used resources of the National Energy Re-
search Scientific Computing Center, a DOE Office of Sci-
ence User Facility supported by the Office of Science of the
U.S. Department of Energy under Contract No. DE-AC02-
05CH11231. This work was supported by the Director, Of-
fice of Science, Office of Advanced Scientific Computing Re-
search, Applied Mathematics program of the U.S. Depart-
ment of Energy under Contract No. DE-AC02-05CH11231.
References
[1] E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on
bayesian optimization of expensive cost functions, with
application to active user modeling and hierarchical re-
inforcement learning. arXiv preprint arXiv:1012.2599,
2010.
[2] R. Chattopadhyay, A. Vintzileos, and C. Zhang. A de-
scription of the madden–julian oscillation based on a
self-organizing map. Journal of Climate, 26(5):1716–
1732, 2013.
[3] G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-
dependent pre-trained deep neural networks for large-
vocabulary speech recognition. Audio, Speech, and Lan-
guage Processing, IEEE Transactions on, 20(1):30–42,
2012.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich
feature hierarchies for accurate object detection and se-
mantic segmentation. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 580–587, 2014.
[5] X. Glorot and Y. Bengio. Understanding the difficulty
of training deep feedforward neural networks. In Inter-
national conference on artificial intelligence and statis-
tics, pages 249–256, 2010.
[6] J. Gorricha, V. Lobo, and A. C. Costa. A framework
for exploratory analysis of extreme weather events using
geostatistical procedures and 3d self-organizing maps.
International Journal on Advances in Intelligent Sys-
tems, 6(1), 2013.
[7] A. Graves, A.-r. Mohamed, and G. Hinton. Speech
recognition with deep recurrent neural networks. In
Acoustics, Speech and Signal Processing (ICASSP),
2013 IEEE International Conference on, pages 6645–
6649. IEEE, 2013.
[8] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed,
N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N.
Sainath, et al. Deep neural networks for acoustic mod-
eling in speech recognition: The shared views of four
research groups. Signal Processing Magazine, IEEE,
29(6):82–97, 2012.
[9] G. Iglesias, D. C. Kale, and Y. Liu. An examination of
deep learning for extreme climate pattern analysis. In
The 5th International Workshop on Climate Informat-
ics, 2015.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Ima-
genet classification with deep convolutional neural net-
works. In Advances in Neural Information Processing
Systems (NIPS), pages 1097–1105, 2012.
[11] K. E. Kunkel, D. R. Easterling, D. A. Kristovich,
B. Gleason, L. Stoecker, and R. Smith. Meteorologi-
cal causes of the secular variations in observed extreme
precipitation events for the conterminous united states.
Journal of Hydrometeorology, 13(3):1131–1141, 2012.
[12] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin.
Exploring strategies for training deep neural networks.
The Journal of Machine Learning Research, 10:1–40,
2009.
[13] D. A. Lavers, G. Villarini, R. P. Allan, E. F. Wood,
and A. J. Wade. The detection of atmospheric rivers in
atmospheric reanalyses and their links to british winter
floods and the large-scale climatic circulation. Journal
of Geophysical Research: Atmospheres, 117(D20), 2012.
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[15] V. Nair and G. E. Hinton. Rectified linear units im-
prove restricted boltzmann machines. In Proceedings of
the 27th International Conference on Machine Learning
(ICML), pages 807–814, 2010.
[16] D. S. Nolan and M. G. McGauley. Tropical cyclogenesis
in wind shear: Climatological relationships and phys-
ical processes. In Cyclones: Formation, Triggers, and
Control, pages 1–36. Nova Science Publishers, 2012.
[17] Prabhat, S. Byna, V. Vishwanath, E. Dart, M. Wehner,
W. D. Collins, et al. Teca: Petascale pattern recogni-
tion for climate science. In Computer Analysis of Im-
ages and Patterns, pages 426–436. Springer, 2015.
[18] Prabhat, O. R¨
ubel, S. Byna, K. Wu, F. Li, M. Wehner,
W. Bethel, et al. Teca: A parallel toolkit for extreme
climate analysis. In Third Worskhop on Data Mining
in Earth System Science (DMESS) at the International
Conference on Computational Science (ICCS), 2012.
[19] D. Ruhmelhart, G. Hinton, and R. Wiliams. Learn-
ing representations by back-propagation errors. Nature,
323:533–536, 1986.
[20] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fer-
gus, and Y. LeCun. Overfeat: Integrated recognition,
localization and detection using convolutional networks.
In International Conference on Learning Representa-
tions (ICLR), 2014.
[21] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong,
and W.-c. Woo. Convolutional lstm network: A ma-
chine learning approach for precipitation nowcasting.
In Advances in Neural Information Processing Systems:
Twenty-Ninth Annual Conference on Neural Informa-
tion Processing Systems (NIPS), 2015.
[22] K. Simonyan and A. Zisserman. Very deep convolu-
tional networks for large-scale image recognition. In
Internaltional Conference on Learning Representation
(ICLR), 2015.
[23] J. Snoek. Spearmint. https://github.com/HIPS/
Spearmint, 2015.
[24] J. Snoek, H. Larochelle, and R. P. Adams. Practical
bayesian optimization of machine learning algorithms.
In Advances in neural information processing systems,
pages 2951–2959, 2012.
[25] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish,
N. Sundaram, M. Patwary, M. Prabhat, and R. Adams.
Scalable bayesian optimization using deep neural net-
works. In Proceedings of The 32nd International Con-
ference on Machine Learning, pages 2171–2180, 2015.
[26] J. T. Springenberg, A. Dosovitskiy, T. Brox, and
M. Riedmiller. Striving for simplicity: The all convo-
lutional net. In International Conference on Learning
Representation (ICLR), 2015.
[27] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to
sequence learning with neural networks. In Advances
in neural information processing systems, pages 3104–
3112, 2014.
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabi-
novich. Going deeper with convolutions. In Proceedings
of the IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 1–9, 2015.
[29] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W.
Smeulders. Selective search for object recognition. In-
ternational Journal of Computer Vision, 104(2):154–
171, 2013.
[30] F. Vitart, J. Anderson, and W. Stern. Simulation of
interannual variability of tropical storm frequency in
an ensemble of gcm integrations. Journal of Climate,
10(4):745–760, 1997.
[31] F. Vitart, J. Anderson, and W. Stern. Impact of large-
scale circulation on tropical storm frequency, intensity,
and location, simulated by an ensemble of gcm integra-
tions. Journal of Climate, 12(11):3237–3254, 1999.
[32] K. Walsh, M. Fiorino, C. Landsea, and K. McInnes.
Objectively determined resolution-dependent thresh-
old criteria for the detection of tropical cyclones in
climate models and reanalyses. Journal of Climate,
20(10):2307–2314, 2007.
[33] K. Walsh and I. G. Watterson. Tropical cyclone-like
vortices in a limited area model: comparison with ob-
served climatology. Journal of Climate, 10(9):2240–
2259, 1997.
[34] M. Wehner, Prabhat, K. A. Reed, D. Stone, W. D.
Collins, and J. Bacmeister. Resolution dependence of
future tropical cyclone projections of cam5. 1 in the us
clivar hurricane working group idealized configurations.
Journal of Climate, 28(10):3905–3925, 2015.