Content uploaded by Morris Riedel
Author content
All content in this area was uploaded by Morris Riedel on Aug 17, 2019
Content may be subject to copyright.
Cloud Deep Networks for Hyperspectral Image Analysis
Journal:
Transactions on Geoscience and Remote Sensing
Manuscript ID
TGRS-2019-00084
Manuscript Type:
Regular paper
Date Submitted by the
Author:
15-Jan-2019
Complete List of Authors:
Haut, Juan M.; Universidad de Extremadura, Tecnología de los
Computadores y las Comunicaciones
Gallardo Jaramago, Jose Antonio; Hypercomp, Department of
Technology of Com-puters and Communications
Paoletti, Mercedes Eugenia; University of Extremadura, Department of
Technology of Computers and Communications
Cavallaro, Gabriele; Forschungszentrum Julich, Jülich Supercomputing
Centre
Plaza, Javier; University of Extremadura, Computer Science Dept.;
University of Extremadura, Computer Science Department
Plaza, Antonio; University of Extremadura, Technology of Computers and
Communications
Riedel, Morris; Juelich Research Center, Federated Systems and Data
Keywords:
Hyperspectral Data
Transactions on Geoscience and Remote Sensing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1
Cloud Deep Networks for Hyperspectral Image
Analysis
Juan M. Haut, Student Member, IEEE, Jose A. Gallardo, Mercedes E. Paoletti, Student Member, IEEE,
Gabriele Cavallaro, Member, IEEE, Javier Plaza, Senior Member, IEEE, Antonio Plaza, Fellow, IEEE,
and Morris Riedel, Member, IEEE
Abstract—Advances in remote sensing hardware have led to a
significantly increased capability for high quality data acquisition,
which allows the collection of remotely sensed images with very
high spatial, spectral and radiometric resolution. This trend
calls for the development of new techniques to enhance the way
such unprecedented volumes of data are stored, processed and
analyzed. An important approach to deal with massive volumes
of information is data compression, related to how data are
compressed before their storage or transmission. For instance,
hyperspectral images (HSIs) are characterized by hundreds of
spectral bands. In this sense, high performance (HPC) and
high throughput (HTC) computing offer interesting alternatives.
Particularly, distributed solutions based on cloud computing
can manage and store huge amounts of data in fault-tolerant
environments, by interconnecting distributed computing nodes
so that no specialized hardware is needed. This strategy greatly
reduces the processing costs, making the processing of high
volumes of remotely sensed data a natural and even cheap
solution. In this paper, we present a new cloud-based technique
for spectral analysis and compression of HSIs. Specifically, we
develop a cloud implementation of a popular deep neural network
for non-linear data compression, known as auto-encoder (AE).
Apache Spark serves as the backbone of our cloud computing
environment by connecting the available processing nodes using
a master-slave architecture. Our newly developed approach has
been tested using two widely available HSI datasets. Experimental
results indicate that cloud computing architectures offer an
adequate solution for managing big remotely sensed data sets.
Index Terms—High performance computing (HPC), high
throughput computing (HTC), cloud computing, hyperspectral
images (HSIs), Auto-encoder (AE), dimensionality reduction
(DR), speed-up.
This paper was supported by Ministerio de Educaci´
on (Resoluci´
on de 26
de diciembre de 2014 y de 19 de noviembre de 2015, de la Secretar´
ıa de
Estado de Educaci´
on, Formaci´
on Profesional y Universidades, por la que
se convocan ayudas para la formaci´
on de profesorado universitario, de los
subprogramas de Formaci´
on y de Movilidad incluidos en el Programa Estatal
de Promoci´
on del Talento y su Empleabilidad, en el marco del Plan Estatal de
Investigaci´
on Cient´
ıfica y T´
ecnica y de Innovaci´
on 2013-2016. This work has
also been supported by Junta de Extremadura (decreto 14/2018, ayudas para
la realizaci´
on de actividades de investigaci´
on y desarrollo tecnol´
ogico, de di-
vulgaci´
on y de transferencia de conocimiento por los Grupos de Investigaci´
on
de Extremadura, Ref. GR18060) and by MINECO project TIN2015-63646-
C5-5-R. (Corresponding author: Juan M. Haut)
J. M. Haut, J. A. Gallardo, M. E. Paoletti, J. Plaza and A. Plaza are with
the Hyperspectral Computing Laboratory, Department of Technology of Com-
puters and Communications, Escuela Polit´
ecnica, University of Extremadura,
10003 C´
aceres, Spain.(e-mail: juanmariohaut@unex.es; mpaoletti@unex.es;
jplaza@unex.es; aplaza@unex.es).
G. Cavallaro is with the J¨
ulich Supercomputing Center, Wilhelm-Johnen-
Straße 52428 J¨
ulich, Germany (e-mail:g.cavallaro@fz-juelich.de)
M. Riedel is with the J¨
ulich Supercomputing Center, Wilhelm-Johnen-
Straße 52428 J¨
ulich, Germany, and with the University of Iceland, 107
Reykjavik, Iceland (e-mail: m.riedel@fz-juelich.de)
I. INT ROD UC TI ON
EARTH Observation (EO) has evolved dramatically in the
last decades due to the technological advances incor-
porated into remote sensing instruments in the optical and
microwave domains [1]. With their hundreds of contiguous and
narrow channels within the visible, near-infrared and short-
wave infrared spectral ranges, hyperspectral images (HSIs)
have been used for the retrieval of bio-, geo-chemical and
physical parameters that characterize the surface of the earth.
These data are now used in a wide-range of applications, aimed
at monitoring and implementing new policies in the domain of
agriculture, geology, assessment of environmental resources,
urban planning, military/defense, disaster management, etc.
[2], [3], [4].
Most of the developments carried out over the last decades
in the field of imaging spectroscopy have been achieved
via spectrometers on board airborne platforms. For instance,
the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)
[5], designed and operated by the NASA’s Jet Propulsion
Laboratory (JPL), was the first full spectral range imaging
spectrometer. It has been dedicated to remote sensing of the
earth in a large number of experiments and field campaigns
since the late 1980s. Other examples of airborne missions
include the European Space Agency (ESA)’s Airborne Prism
Experiment (APEX) (2011-2016) [6], or the Compact Air-
borne Spectrographic Imager (CASI) [7] (1989-today), among
many others.
The vast amount of data collected by airborne platforms
has paved the way for EO satellite hyperspectral missions.
The Hyperion instrument on-board NASA’s Earth Observing
One (EO-1) spacecraft (2000-2017) [8] and the Compact
High Resolution Imaging Spectrometer (CHRIS) on ESA’s
Proba-1 microsatellite [9] (2001-today) have been to of the
main sources of space-based HSI data in the last decades.
Currently, there are several HSI missions under development,
including the Environmental Mapping and Analysis Program
(EnMAP) [10], the Prototype Research Instruments and Space
Mission technology Advancement (PRISMA) [11], among
others. Their main objective is to fill the current gap in
space-based imaging spectroscopy data and achieve better
radiometric performance than the precursor missions.
The adoption of an open and free data policy by the National
Aeronautics and Space Administration (NASA) [12] and, more
recently, by ESA’s Copernicus initiative (the largest single
EO programme) [13] is now producing an unprecedented
Page 1 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2
amount of data to the research community. For instance, in
2017 the Sentinel Data Access System provided an estimated
10.04 TB/day with an average download volume of 93.5
TB/day1. Even though the Copernicus space component (i.e.,
the Sentinels) have not included a hyperpectral instrument
yet (Sentinel-10 is a HSI mission expected to be operational
around 2025-2030), it has been shown that the vast amount
of open data currently available calls for re-definition of
the challenges within the entire HSI life cycle (i.e., data
acquisition, processing and application phases). It is not by
coincidence that remote sensing data are now described under
the big data terminology, with characteristics such as volume
(increasing scale of acquired/archived data), velocity (rapidly
growing data generation rate, real-time processing needs),
variety (data acquired from multiple sources), veracity (data
uncertainty/accuracy), and value (extracted information) [14],
[15].
In this context, traditional processing methods such as
desktop approaches (i.e., MATLAB, R, SAS, ENVI, etc.) offer
limited capabilities when dealing with such large amounts of
data, especially regarding the velocity component (i.e., the
demand for real-time applications). Despite modern desktop
computers and laptops are becoming increasingly more pow-
erful, with multi-core and many-core capabilities including
graphics processing units (GPUs), the limitations in terms of
memory and core availability currently limit the processing of
large HSI data archives. Therefore, the use of highly scalable
parallel processing approaches is a mandatory solution to
improve the access to and the analysis of such great amount of
complex data, in order to provide decision-makers with clear,
timely, and useful information [16], [17].
Many changes have been introduced to parallel and dis-
tributed architectures over the past 30 years. In particular,
research has been focused on how to leverage many-core
architectures (e.g., GPUs) to deal with the growing demand
of domain-specific applications for handling computationally
intense problems. Other parallel architectures such as clusters
[18], grids [19], or clouds [20], [21] have also been widely
exploited for remotely sensed data processing, since they pro-
vide tremendous storage/computation capacity and outstanding
scalability. Parallel and distributed computing approaches can
be categorized into high performance computing (HPC) or
high throughput computing (HTC) solutions. Contrary to an
HPC system [22] (generally, a supercomputer that includes
a massive number of processors connected through a fast
dedicated network), an HTC system is more focused on the
execution of independent and sequential jobs that can be
individually scheduled on many different computing resources,
regardless of how fast an individual job can be completed.
A classic example of an HPC system is a cluster, while a
typical example of an HTC system is a grid. Cloud com-
puting is the natural evolution of grid computing, adopting
its backbone and infrastructure [21] but delivering computing
resources as a service over the network connection [23]. In
other words, the cloud moves desktop and laptop computing
1https://sentinel.esa.int/web/sentinel/news/-/article/sentinel-data-access-
annual-report-2017
(via the Internet) to a service-oriented platform using large
remote server clusters, and massive storage to data centres. In
this scenario, computing relies on sharing a pool of physical
and/or virtual resources, rather than on deploying local or
personal hardware and software. The process of virtualization
has enabled the cost-effectiveness and simplicity of cloud
computing solutions [24] (i.e., it exempts users from the
need to purchase and maintain complex computing hardware)
such as IaaS (infrastructure as a service), PaaS (platform as
a service), or SaaS (software as a service). Several cloud
computing resources are currently available commercially, on
apay as you go model from providers such as Amazon Web
Services (AWS) [25], Microsoft Azure [26], and Google’s
Compute Engine [27].
Cloud computing infrastructures can rely on several comput-
ing frameworks that support the processing of large data sets
in a distributed environment. For example, the MapReduce
model [28] is the basis of a large number of open-source im-
plementations. The most popular ones are Apache Hadoop [29]
and its variant, Apache Spark [30] (an in-memory computing
framework). Despite the recent advances in cloud computing
technology, not enough efforts have been devoted to exploiting
cloud computing infrastructures for the processing of HSI data.
However, cloud computing offers a natural solution for the
processing of large HSI databases, as well as an evolution of
previously developed techniques for other kinds of computing
platforms, mainly due to the capacity of cloud computing to
provide internet-scale, service-oriented computing [31], [32],
[33].
In this work, we focus on the problem of how to develop
scalable data analysis and compression techniques [34], [35],
[4], [36] with the goal of facilitating the management of
remotely sensed HSI data. Dimensionality reduction (DR) of
HSIs is a fundamental pre-processing step that is applied
before many data transfer, store and processing operations. On
the one hand, when HSI data are efficiently compressed, they
can be handled more efficiently on-board satellite platforms
with limited storage and downlink bandwidth. On the other
hand, since HSI data lives primarily in a subspace [37], a
few informative features can be extracted from the hundreds
of highly correlated spectral bands that comprise HSI data
[38] without significantly affecting the data quality (lossy
compression of HSIs can still retain informative data for
subsequent processing steps).
Specifically, this paper develops a new cloud implemen-
tation of HSI data compression. As in [39], we adopt the
Hadoop distributed file system (HDFS) and Apache Spark
as well as a map-reduce methodology [24] to carry out our
implementation. However, we address the the DR problem
using a non-linear deep auto-encoder (AE) neural network
instead of the standard linear principal component analysis
(PCA) algorithm. The performance of our newly proposed
cloud-based AE is validated using two widely available and
known HSI data sets. Our experimental results show that
the proposed implementation can effectively exploit cloud
computing technology to efficiently perform non-linear com-
pression of large HSI data sets, while accelerating significantly
the processing time in a distributed environment.
Page 2 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 3
The remainder of the paper is organized as follows. Sec-
tion II provides an overview of the theoretical and opera-
tional details of the considered AE neural network for HSI
data compression, and the considered optimization method.
Section III presents our cloud-distributed AE network for
HSI data compression, describing the details of the network
configuration and the distributed implementation. Section IV
evaluates the performance of the proposed approach using
two widely available HSI data sets, taking into account the
quality of the compression and signal reconstruction, and also
the computational efficiency of the implementation in a real
cloud environment. Finally, section V concludes the work,
summarizing the obtained results and suggesting some future
research directions.
II. BACKGROU ND
HSI data are characterized by their intrinsically complex
spectral characteristics, where samples of the same class
exhibit high variability due to data acquisition factors or
atmospheric and lighting interferers. DR and feature extraction
(FE) methods are fundamental tools for the extraction of dis-
criminative features that reduce the intra-class variability and
inter-class similarity [40] present in HSI data sets. Futhermore,
by reducing the high spectral dimensionality of HSIs, these
methods are able to alleviate the curse of dimensionality [41],
which makes HSI data difficult to interpret by supervised
classifiers due to the Hughes phenomenon [42].
Several methods have been developed to perform DR and
FE from HSIs, such as the independent component analysis
(ICA) [43], [44] or the maximum noise fraction (MNF) [45],
[46], being PCA [47], [48], [49] one of the most widely used
methods for FE purposes. This unsupervised, linear algorithm
reduces the original high-dimensional and correlated feature
space to a lower-dimensional space of uncorrelated factors
(also called principal components or PCs) by applying an
orthogonal transformation through a projection matrix, which
makes it a simple yet efficient algorithm. However, PCA is re-
stricted to a linear map-projection and is not able to learn non-
linear transformations. In this context, auto-associative neural
networks such as AEs [50] offer a more flexible architecture
for FE and DR purposes, managing the non-linearities of the
data through an architecture made up of stacked layers and
non-linear activation functions (called stacked AE or SAE)
that can provide more detailed data representations from the
original input image (one per layer), which can be reused by
other HSI processing methods.
A. Auto-encoder (AE) Neural Network
Let us consider a HSI data cube X∈Rn1×n2×nbands ,
where n1×n2are the spatial dimensions and nbands is the
number of spectral bands. Xis traditionally observed by
pixel-based algorithms as a collection of n1×n2spectral
samples, where each xi∈Rnbands = [xi,1, xi,2,· · · , xi,nbands ]
contains the spectral signature of the observed surface ma-
terial. In this sense, the goal of DR methods is to obtain,
for each xi, a vector ci∈Rnnew that captures the most
representative information of xiin a lower feature-space, being
Fig. 1. Graphic representation of a traditional auto-encoder for spectral
compression and restoration of hyperspectral images.
nnew << nbands. To achieve this goal, the SAE applies
an unsupervised symmetrical deep neural network to encode
the data in a lower-dimensional latent space, performing a
traditional embedding, and then decoding it to the original
space through a reconstruction stage. In fact, the SAE can
be interpreted as a mirrored net, where three main parts
can be identified (as shown in Fig. 1): i) the encoder or
mapping layers, ii) the middle or bottleneck layer, and iii)
the decoder or demapping layers. Based on the traditional
multilayer perceptron (MLP), the l-th layer defined in the SAE
performs an affine transformation between the input data x(l)
i
and its set of weights W(l)and biases b(l), as Eq. (1) indicates:
x(l+1)
i=Hx(l)
i·W(l)+b(l),(1)
where x(l+1)
i∈Rn(l)is an abstract representation (or feature
representation) of the original input data xiin the feature
space obtained by the n(l)neurons that compose the l-th
layer, where the output of the k-th neuron is obtained as the
dot product between the n(l−1) outputs of the previous layer
and its weights, passed through an activation function that is
usually implemented by the Rectified Linear Unit (ReLU) [51],
i.e. H(x) = max(0, x). Finally, the k-th feature in x(l+1)
ican
be obtained as:
x(l+1)
i,k =H
n(l−1)
X
j=1 x(l)
i,j ·w(l)
k,j +b(l)
.(2)
With this in mind, the SAE applies two main processing
steps to each input sample xi. The first one, known as coding
stage, performs the embedding of the data, mapping it from
Rnbands space to Rnnew latent space. That is, the nencoder
layers of the encoder map their input data to a projected
representation following Eqs. (1) and (2), until reaching the
bottleneck layer. As a result, the bottleneck layer contains
the projection of each xi∈Rnbands in its latent space,
defined by its nnew neurons, ci∈Rnnew . As a result,
Page 3 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 4
the SAE allows to generate compressed (nnew < nbands),
extended (nnew > nbands) or even equally (nnew =nbands)
dimensional representations, depending on the final dimension
of the code vector ci.
The second stage performs the opposite operation, i.e. the
decoding, where the network tries to recover the original
information, obtaining an approximate reconstruction of the
original input vector [52]. In this case, the ndecoder layers of
the decoder demap the code vector ciuntil reaching the output
layer, where a reconstructed sample x0
iis obtained. Eq. (3)
gives an overview of the encoding-decoding process followed
by the SAE:
ci←For lin nencoder:x(l+1)
i=Hx(l)
i·W(l)+b(l)
x0
i←For ll in ndecoder:c(ll+1)
i=Hc(ll)
i·W(ll)+b(ll)
(3)
In order to obtain a lower-dimensional (but more discrimi-
native) representation of the input data, the network parameters
are iteratively adjusted in unsupervised fashion, where the
optimizer minimizes the reconstruction error between the input
data at the encoding stage, xi, and its reconstruction at the end
of the decoding stage, x0
i. This error function, given by Eq.
(4), is usually implemented in the form of a mean squared
error (MSE):
E(X) = min kX−X0k2= min
n1·n2
X
i=1
kxi−x0
ik2.(4)
B. Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm
After describing the operational procedure of SAEs, it is
now important to observe the network optimization process.
As any artificial neural network with back-propagation, the
optimizer tries to find the set of parameters (synaptic weights
and biases) that, for a given network architecture, minimize
the error function E(X)defined by Eq. (4). This function
evaluates how well the neural network fits the dataset X, and
depends on the adaptative and learnable parameters of the
network, that can be denoted as W, so E(X,W). As E(X,W)
is non-linear, its optimization must be carried out iteratively,
reducing its value until an adequate stopping criterion is
reached. In this sense, standard optimizers back-propagate the
error signal through the network architecture calculating, for
each learnable parameter, the gradient of the error, i.e. the
direction and displacement that the parameter must undergo
in order to minimize the final error (also interpreted as the
importance of that parameter when obtaining the final error).
Mathematically, the updating of Win the t-th epoch can be
calculated by Eq. (5):
Wt+1 =Wt+ ∆W,
being ∆W=µt·pt,(5)
where µtand ptare the learning rate (a positive scalar) and
the descent search direction, respectively [53]. The main goal
of any optimizer is to obtain the correct ptin order to descend
properly in the error function until the minimum is reached.
As opposed to standard optimizers, traditional Newton-
based methods determine the descent direction ptusing the
second derivative information contained into the Hessian ma-
trix, rather than just the gradient information, thus stabilizing
the process:
Ht·pt=−∇E(X,Wt)
pt=−H−1
t· ∇E(X,Wt)
Wt+1 =Wt−µt·H−1
t· ∇E(X,Wt),
(6)
where ∇E(X,Wt)is the gradient of the error function eval-
uated with the network’s parameters at the t-th epoch, Wt,
and Htand H−1
tare respectively the Hessian matrix and its
inverse, obtained at the t-th epoch. However, these methods
obtain the Hessian matrix and its inverse at each epoch,
which is quite expensive to compute, requiring a large amount
of memory. Instead of that, the Broyden-Fletcher-Goldfarb-
Shanno (BFGS) method [54] performs an estimation of how
the Hessian matrix has changed in each epoch, obtaining an
approximation (instead of the full matrix) that is improved
every epoch. In fact, as any algorithm of the family of
multivariate minimization quasi-newton methods, the BFGS
algorithm modifies the last expression of Eq. (6) as follows:
Wt+1 =Wt−µt·Gt· ∇E(X,Wt),(7)
where Gtis the inverse Hessian approximation matrix (usu-
ally, when t= 0 the initial approximation matrix is the identity
matrix, G0=I). This Gtis updated at each epoch by means
of an update matrix:
Gt+1 =Gt+Ut.(8)
However, such update needs to comply with the quasi-
Newton condition, which is described below. Assuming that
E(X,W)is continuous for Wtand Wt+1 (with gradients
gt=∇E(X,Wt)and gt+1 =∇E(X,Wt+1), respectively)
and the Hessian His constant, then Eq. (9) is satisfied:
qt≡gt+1 −gtand pt≡ Wt+1 − Wt
Secant condition on the Hessian: qt=H·pt
Secant condition on the inverse: H−1·qt=pt
(9)
Since G=H−1, the last expression in Eq. (9) can be
modified to G·qt=pt, so the approximation matrix G
can be obtained (at each epoch t) as a combination of the
linearly independent directions and their respective gradients.
Following the Davidon, Fletcher and Powell (DFP) rank 2
formula [55], Gcan be updated using Eq. (10):
Gt+1 =Gt+pt·p>
t
p>
t·qt
−Gt·qt·q>
t
q>
t·Gt·qt
·Gt.(10)
Finally, the BFGS method updates its approximation matrix
by computing the complementary formula of the DFP method,
changing Gby Hand ptby qt, so Eq. (10) is finally modified
as follows:
Ht+1 =Ht+qt·q>
t
q>
t·pt
−Ht·pt·p>
t
p>
t·Ht·pt
·Ht.(11)
Page 4 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 5
As the BFGS method intends to compute the inverse of Hand
G=H−1, it inverts Eq. (11) to analytically obtain the final
update of the approximation matrix:
Gt+1 =Gt+1 + q>
t·Gt·qt
q>
t·pt·pt·p>
t
p>
t·qt
−pt·q>
t·Gt+Gt·qt·p>
t
q>
t·pt
(12)
Algorithm 1 Broyden-Fletcher-Goldfarb-Shanno Algorithm
1: procedure BFGS(Wt: current parameters of the neural
network, E(X,W): Error function, Gt: current approxi-
mation to the Hessian)
2: gt=∇E(X,Wt)
3: pt=−Gt·gt
4: Wt+1 =Wt+µt·pt µtby linear search
5: gt+1 =∇E(X,Wt+1)
6: qt=gt+1 −gt
7: pt=Wt+1 − Wt
8: A=1 + q>
t·Gt·qt
q>
t·pt·pt·p>
t
p>
t·qt
9: B=pt·q>
t·Gt+Gt·qt·p>
t
q>
t·pt
10: Gt+1 =Gt+A−B
return Wt+1,Gt+1
11: end procedure
Algorithm 1 provides a general overview of how the BFGS
method works in one epoch. A weakness of BFGS is that it
requires the computation of the gradient on the full dataset,
consuming a large amount of memory to properly run the
optimization. Taking into account the dimensionality of HSIs,
we can conclude that this method is not able to scale with the
number of samples [56]. In order to overcome this limitation,
and with the aim of speeding up the computation of both
the forward (affine transformations) and backward (optimizer)
steps of the AE for DR of HSIs, in the following section we
develop a distributed solution for cloud computing environ-
ments.
III. PROP OS ED IM PL EM EN TATIO N
A. Distributed Framework
We have developed a completely new distributed AE for HSI
data analysis2. In this context, two problems have been specif-
ically addressed in this work: i) the computing engine, and ii)
the distributed programming model over the cloud architecture.
Regarding the first problem, our distributed implementation
of the network model is run on top of a standalone Spark
cluster, due to its capacity to provide fast processing of
large data volumes on distributed platforms, in addition to
fault-tolerance. Furthermore, the Spark cluster is characterized
by a master-slave architecture, which makes it very flexible.
Specifically, a Spark cluster is formed by a master node, which
manages how the resources are used and distributed in the
cluster by hosting a Java virtual machine (JVM) driver, and the
scheduler, which distributes the tasks between the execution
2Code avaiable on: https://github.com/jgallardst/cloud-nn-hsi
nodes, and Nworker nodes (which can be more than one
per node) that execute the program tasks by creating a Java
distributed agent, called executor (where tasks are computed),
and store the data partitions (see Fig. 2).
Fig. 2. Graphic representation of a generic Spark cluster, which is composed
by one client node and Nworker nodes, where in each node several executor
Java virtual machines are running in parallel over several data partitions.
In relation to the second point, the adopted programming
model to perform the implementation of the distributed AE
is based on organizing the original HSI data in tuples or
key/value pairs, in order to apply the MapReduce model [39],
which divides the data processing task into two distributed
operations: i) mapping, which processes a set of data-tuples,
generating intermediate key-value pairs, and ii) reduction,
which gathers all the intermediate pairs obtained by the
mapping to generate the final result. In order to achieve
this behavior, data information in Spark is abstracted and
encapsulated into a fault-tolerant data structure called Resilient
Distributed Dataset (RDD). These RDDs are organized as
distributed collections of data, which are scattered by Spark
across the worker nodes when they are needed on the succes-
sive computations, being persisted on the memory of the nodes
or on the disk. This architecture allows for the parallelization
of the executions, achieved by performing MapReduce tasks
over the RDDs on the nodes. Moreover, two basic operations
can be performed on a RDD: i) the so-called transformations,
which are based on applying an operation to every row on
a partition, resulting in another RDD, and ii) actions, which
retrieve a value or a set of values that can be both RDD data or
the result of an operation where some RDD data are involved.
Operations are queued until an action is called; the needed
transformations are placed into a dependency graph, where
each node is a job stage, following a lazy execution paradigm.
This means that operations are not performed until they are
really needed, avoiding the repetition of a single operation
more than once.
In order to enable a simple and easy mechanism for man-
aging large data sets, the Spark environment provides another
Page 5 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 6
level of abstraction that uses the concept of Dataframe. These
Dataframes allow data to be organized on named columns,
being easier to manipulate (as in relational tables, columns
can be accessed by the column name instead of by the index).
With this in mind, the Spark standalone cluster functionality
can be summarized as follows:
1) The master node (also caled driver node) creates and
manages the Spark driver (see Fig. 2), a Java process
that contains the SparkContext of the application.
2) The driver context performs the data partitioning and
parallelization between the worker nodes, assigning to
each one a number of partitions, which depends on two
main aspects: the block size and the way the data are
stacked. Also, the driver creates the executors on the
worker nodes, which store data partitions on the worker
node and perform tasks on them.
3) When an action is called, a job is launched and the
master coordinates how the associated tasks are dis-
tributed into the different executors. In order to reduce
the data exchanging time, the Spark driver attempts to
perform “smart” task allocations, so that there are more
possibilities to assign a task to the executor, located in
the worker where the data partition used by the task to
perform the operation has been allocated.
4) When all the tasks on a given stage are finished, the
Scheduler allocates another stage of the job (if it was
a transformation), or retrieves the final output (if it was
an action).
Algorithm 2 shows a general overview of how our algorithm
is pipelined in the considered Spark cluster.
Algorithm 2 Iterative Process
1: procedure SPARK FLOW
2: P artitionedData ←Spark.parallelizeData()
3: t←0
4: while t< niterations do
5: broadcastOutputData().
6: foreach partition ∈P artitionedData do
7: P artitionedData.apply T ask().
8: end for
9: retrieveOutputData().
10: t←t+1
11: end while
12: end procedure
B. Cloud Implementation
This section describes in detail the full distributed training
process, from the parallelization of HSI data across nodes to
the intrinsic logic of each training step, explaining the benefits
of our distributed training algorithm. Fig. 3 gives a general
overview of the full data pipeline developed in this work.
In the beginning, the original 3-dimensional HSI data cube
X∈Rn1×n2×nbands , where n1×n2are the spatial dimensions
(height and width) and nbands is the spectral dimension given
by the number of spectral channels, is reshaped into a HSI
matrix X∈Rnpixels×nbands , where npixels =n1×n2, i.e.
each row collects a full spectral pixel, being each column the
Fig. 3. Data pipeline of our distributed auto-encoder, where the input HSI
cube is first reshaped into a matrix and then split into several partitions
allocated into the Spark worker nodes, composed by several rows where each
one contains BS spectral pixels. These data partitions are then scaled and
duplicated in order to obtain the input network data and the corresponding
output network data. The AE is then executed and, for each iteration t, the
gradients are collected by the Spark driver, which calculates the final gradient
and performs the optimization with the L-BFGS algorithm. The updated
weights are finally broadcasted to each neural model contained in the cluster.
corresponding value in the spectral band. This matrix Xis
read by the Spark Driver, which collects the original HSI data
and partitions it into Psmaller subsets that are delivered to
the worker nodes in parallel. These workers store the obtained
partitions on their local disks. In this sense, each data partition
composes a RDD.
It must be noted that complex neural network topologies
derive on greedy RAM memory usage on the driver node.
Since Spark transformations apply an operation to every row
of the RDD, the fewer the number of rows, the fewer the
number of operations that must be carried out. In order to
improve the computation of the distributed model, a blocksize
(BS) hyperparameter is provided, with the aim of indicating
how many pixels should be stacked into a single row in order
to compute them together. With this observation in mind, the
p-th data partition (with p= 1,· · · , P ) can be seen as a 2-
dimensional matrix (p)D∈Rnrows ×(BS·nbands )composed by
nrows rows, where each one stores BS concatenated spectral
Page 6 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 7
pixels, i.e. (p)dj∈R(BS·nbands )= [xi,xi+1,· · · ,xi+B S ]. In
the end, each data partition (p)Dstores BS ·nrow s pixels.
The resulting partitions are then distributed across the worker
nodes. Such distribution allows the executors, located in each
worker node, to apply the subsequent tasks to those partitions
that each worker receives.
After distributing the data into RDDs, a distributed data
analysis process begins prior to the application of neural
network-based processing. In the first step, the data contained
in each partition (p)Dare scaled in a distributed way, taking
advantage of the cloud architecture and the available paral-
lelization of resources. In this sense, each partition’s row (p)dj
(and, internally, each pixel contained within) is transformed
based on the global maximum and minimum features (xmax
and xmin) of the whole image X, and the column local
maximum and minimum features ((p)dmax and (p)dmin), of
the p-th partition where the data are allocated:
(p)ˆ
dj=
(p)dj−(p)dmin
(p)dmax −(p)dmin
(p)dj=(p)ˆ
dj·(xmax −xmin) + xmin
(13)
Once the HSI data has been split into partitions and scaled,
the next step consists of the application of the AE model.
The proposed AE is composed by 5 layers, as summarized in
Table I. These layers are: l(1) , the input layer that receives
the spectral signature contained in each pixel xiof X(i.e.,
the rows of the distributed partitions), composed by as many
neurons as spectral bands; l(2),l(3) and l(4) : the hidden AE-
layers, and l(5): the output layer that obtains the reconstructed
signature x0
i, composed also by as many neurons as spectral
bands.
TABLE I
TOP OLO GY O F THE PRO PO SED AU TO- ENCODER NEURAL NE TWO RK F OR
HYPERSPECTRAL IMAG E ANALYS IS
Layer ID l(1) l(2) l(3) l(4) l(5)
Neurons per l(i)nbands 140 60 140 nbands
With the topology described in Table I in mind, the encoder
part is composed by l(1),l(2) and l(3) , which performs the
mapping from the original spectral space to the latent space
of the bottleneck layer l(3). In addition, the decoder part
is composed by l(3),l(4) and l(5) , which performs the de-
mapping from the latent space of l(3) to the original spectral
space.
At this point, it is interesting to briefly comment the per-
formance of the AE network. In order to correctly propagate
the data through the network, from each partition (p)D∈
Rnrows ×(BS·nbands ), a matrix of unstacked pixels (p)X∈
R(BS·nr ows)×nbands is extracted, i.e. the BS spectral pixels
that are contained in each (p)dj= [xi,xi+1,· · · ,xi+B S ](with
j= 1,· · · , nrows and i= 1,· · · , npixels ) are each extracted
to create, one by one, the rows of (p)X, denoted as (p)xk[with
k= 1,· · · ,(BS ·nrow s)] in order to determine the level at
which the AE is working.
Every training iteration tis performed using the traditional
neural network forward-backward procedure, in addition to a
tree-aggregate operation that computes and sums the execu-
tors’ gradients and losses to return a single loss value and
a matrix of gradients. Each executor computes its loss by
forwarding the input network data (p)Xthrough the AE layers,
and comparing the l(5) layer’s output vector with the vector of
input features, following Eq. (4) and obtaining (at each t) the
corresponding MSE of the partition: (p)MSEt=E((p)X,Wt).
Gradients are then computed by back-propagating the error
signal through the AE, obtaining for each partition the (p)Gt
matrix at iteration t. Each gradient matrix is reduced in the
Driver, which runs the optimizer in order to obtain the final
matrix ∆Wt. This matrix indicates how much each neuron
weight should be modified before finishing the t-th training
iteration, based on how that neuron impacts the output. Fig. 4
gives a graphical overview of the adopted training procedure.
If we denote by Pthe number of total partitions and
by (p)X∈R(BS·nr ows)×nbands the p-th unstacked partition
data, composed by (BS ×nrow s)normalized rows/feature
vectors of nbands spectral features, i.e. (p)xk∈Rnbands =
(p)xk,1,· · · ,(p)xk,nbands , and considering the l-th layer of
the AE model, composed by n(l)
neurons, its output is denoted
by (p)X(l+1) and it is computed by adapting Eq. (1) into Eq.
(14) as the matrix multiplication:
(p)X(l+1) =H(p)X(l)·W(l)+b(l),(14)
where the meaning of each term is:
•(p)X(l+1) ∈R(BS×nr ows)×n(l)
neurons is the matrix that
represents the output of the neurons in layer lwith size
(BS ·nrow s)×n(l)
neurons, where n(l)
neurons is the number
of neurons of the l-th layer (in the case that l= 5,
n(5)
neurons =nbands).
•(p)X(l)∈R(BS×nr ows)×n(l−1)
neurons is the matrix that serves
as input to the l-th layer, which contains the (BS ·nrows )
pixel vectors represented in the feature space of the
previous layer, defined by n(l−1)
neurons neurons.
•W(l)∈Rn(l−1)
neurons×n(l)
neurons is the matrix of weights,
which connects the current n(l)
neurons neurons with the
n(l−1)
neurons neurons of the previous layer, and b(l)is the
bias of the current layer.
•His the ReLU activation function, which gives the
following non-linear output: ReLU(x) = max(0, x).
After data forwarding, the reconstructed data (p)X0in the
p-th partition at the t-th iteration is compared to the original
input (p)Xby applying the MSE function defined by Eq. (4)
on each executor. Executors then retrieve the error computed
by their carried data, obtaining a value (p)MSEtper partition.
Then, the final error is obtained as the mean of all executor
errors, as shown in Eq. (15):
(p)MSEt=1
(BS ×nrow s)
(BS×nr ows)
X
k=1
k(p)xk−(p)x0
kk2
MSEt=1
P
P
X
p=1
(p)MSEt,
(15)
Page 7 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 8
Fig. 4. Distributed forward and backward pipelines of the training stage (at iteration t) after unstacking the hyperspectral pixels in each distributed data
partition (each one allocated to a different worker node).
where (BS ×nrow s)is the number of pixels that compose the
p-th data partition, whereas (p)xk∈(p)Xand (p)x0
k∈(p)X0
are the original input sample and output reconstructed sample
in the p-th data partition, respectively. Those partition errors
are then back-propagated to compute the gradient (p)Gtmatrix
of each partition at iteration t. In this sense, for each layer in
the neural model (using the resulting outputs) the impact that
each neuron has on the final error is obtained as the result
of the ReLU’s derivative of every output, which is defined as
follows:
H0(x) = (0,if x≤0
1,if x > 0(16)
Such impact can be denoted as (p)gL
t= [(p)g(1)
t,· · · ,(p)g(5)
t],
where the l-th element (p)g(l)
tstores the impact of the n(l)
neurons
allocated into the l-th layer of the network.
The gradient of each partition, (p)Gt, is then computed
by applying the double precision general matrix to matrix
multiplication (DGEMM) operation where, given three input
matrices (A,Band C) and two constants (αand β), the
obtained results are calculated by Eq. (17) and stored in C:
C=α∗A∗B+β∗C.(17)
DGEMM is performed to compute the entire gradient matrix
in parallel, instead of computing each layer gradient vectors
separately. This allows us to make neural computations faster
and efficient in terms of reducing power consumption. In this
sense, each item of Eq. (17) has been replaced by:
•α=1
nbands is a parameter regularizer.
•A=(p)Xis the input data partition matrix.
•B=(p)gL
tis the matrix representing the impact of each
neuron on every layer of the neural network.
•β= 1 is also a parameter regularizer. As Cshould be
unchanged, it has been set to 1.
•C=(p)Gt−1is initially the older gradient matrix of
the p-th partition. After the updates resulting from the
DGEMM operation, the current gradient (p)Gtis stored
on C.
Finally, the gradient matrix Gtof the whole network is
computed as the average of the sum of all partition’s gradients:
(p)Gt. The entire training process on each data partition is
graphically illustrated on Fig. 4.
The final optimization step is performed locally on the
master node using a variant of the BFGS algorithm, called
limited BFGS (L-BFGS). Since BFGS needs a huge amount
of memory for the computation of the matrices, L-BFGS limits
the memory usage, so it fits better into our implementation.
The optimizer uses the computed gradients and a step-size
procedure to get closer to a minimum of Eq. (4). The procedure
is repeated until a desired number of iterations, niterations, is
reached.
IV. EXP ER IM EN TAL EVALUATIO N
A. Configuration of the Environment
In order to test our newly developed implementation, a
dedicated hardware and software environment based on a high-
end cloud computing paradigm has been adopted. The virtual
resources have been provided by the Jetsream Cloud Services3
at the Indiana University Pervasive Technology Institute (PTI).
Its user interface is based on Atmosphere computing platform4
and uses Openstack5as the operational software environment.
The hardware environment consists of a collection of cloud
computing nodes. In particular, the cluster consists of one
master node and eight slave nodes, which are hosted in virtual
machines with six virtual cores at 2.5 GHz each. Every node
has 16 GB of RAM and 60 GB of storage via a Block Storage
File System. As mentioned before, Spark performs as the
backbone for node interconnection, meanwhile data transfers
are supported by a local 4x40 Gbps dedicated network.
Each virtual machine runs Ubuntu 16.04 as operating sys-
tem, with Spark 2.1.1 and Java 1.8 serving as running plat-
forms. The Spark framework provides a distributed machine
learning library known as Spark Machine Learning Library
3https://jetstream-cloud.org/
4https://www.atmosphereiot.com/platform.html
5https://www.openstack.org/
Page 8 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 9
(MLLib6), which is used as support for the implementation
of our distributed AE network for remotely sensed HSI data
analysis. Moreover, the proposed implementation has been
coded in Scala 2.11, compiled into Java bytecode and inter-
preted in JVMs. Finally, mathematical operations from MLLib
are handled by Breeze (the numerical processing library for
Scala), in its 0.12 version, and by Netlib 1.1.2. In this sense,
Netlib wraps JVM calls into low level Basic Linear Algebra
Subprograms (BLAS) calls, so those calls are executed faster
than traditional executions.
B. Hyperspectral Datasets
With the aim of testing the performance of our newly
developed cloud-based and distributed AE network model,
two different HSI data sets have been considered in our
experiments. These data sets correspond to the full version of
the AVIRIS Indian Pines scene, referred hereinafter as the big
Indian Pines scene (BIP), and a set of images corresponding to
six different zones captured by the Hyperion spectrometer [57]
onboard NASA’s EO-1 Satellite, which we have designated as
Hyperion data set (HDS). Both data sets are characterized by
their huge size, which makes them ideal to be processed in a
cloud-distributed environment. In the following, we provide a
description of the aforementioned data sets.
•The big Indian Pines scene (BIP) scene (see Fig. 5)
was collected by AVIRIS in 1992 [5] over agricultural
fields in northwestern Indiana. The image comprises a
full flightline with a total of 2678 ×614 pixels (with 20
meters per pixel spatial resolution), covering 220 spectral
bands from 400 to 2500 nm.
•The Hyperion data set (HDS) is composed by six full
flightlines (see Fig. 6) collected in 2016 by the Hyperion
spectrometer mounted on NASA’s EO-1 satellite, which
collects spectral signatures using 220 spectral channels
ranging from 357 to 2576 nm with 10-nm bandwidth. The
captured scenes have a spatial resolution of 30 meters per
pixel. The standard scene width and length are 7.7 km
and 42 km, respectively, with an optional increased scene
length of 185 km. In particular, the considered images
have been stacked and treated together as a single image
comprising 20401 ×256 pixels with the spectral range
mentioned above. These images have been provided by
the Earth Resources Observation and Science (EROS)
Center in GEOTIFF format7. Also, each scene is accom-
panied by one identifier in the format YDDDXXXML,
which indicates the day of acquisition (DDD), and the
sensor that recorded the image (XXX, denoting Hyperion,
ALI or AC with 0=off and 1=on), the pointing mode
(M, which can be Nfor Nadir, Pfor Pointed within
path/row or Kfor Pointed outside path/row) and the scene
length (L, which can be Ffor Full scene, Pfor Partial
scene, Qfor Second partial scene and Sfor Swath). Also,
other letters can be used to create distinct entity IDs, for
example to indicate the Ground/Receiving Station (GGG)
6https://spark.apache.org/mllib
7These scenes are available online from the Earth Explorer site,
https://earthexplorer.usgs.gov
or the Version Number (VV). In this case, the identifiers
of the six considered images are: 065110KU, 035110KU,
212110KR, 247110KW, 261110KR and 321110KR.
C. Experiments and Discussion
Three different experiments have been conducted in order
to validate the performance of our cloud-distributed AE for
HSI data compression:
1) The first experiment analyzes the scalability of our
cloud-distributed AE, using a medium-sized data set. For
this purpose, the BIP data set has been processed with a
fixed number of training samples in the cloud environ-
ment described above, using one master and different
numbers of worker nodes. Here, we have reduced the
dimensionality of the BIP data set using PCA, retaining
the first 60 principal components that account for most
of the variance in the original data.
2) The second experiment illustrates the internal paral-
lelization (at the core level) of the worker nodes. For
this purpose, the HDS has been processed using four
different percentages of training data and 8 worker nodes
in the considered cluster, each with 6 virtual cores. As in
the previous experiment, we reduced the dimensionality
of the HDS data set using PCA, retaining the first
60 principal components that account for most of the
variance in the original data.
3) Finally, the third experiment test the performance of our
cloud-distributed AE using different numbers of training
samples and worker nodes over a large data set. This ex-
periment allows us to understand the internal operation
of data partitions. In this sense, the HDS data set used in
the previous experiment has been considered again using
4 different training percentages and 6 different numbers
of worker nodes.
1) Experiment 1: Our first experiment evaluates the per-
formance of the distributed implementation of the proposed
AE, using the BIP scene (reduced to 60 principal components
extracted by PCA) using 80% randomly selected samples to
create the training set and the remaining 20% of the samples
to create the test set. In order to demonstrate the scalability
of our cloud-distributed AE, the cloud environment has been
configured with one master node and different numbers of
worker nodes, specifically: 1, 2, 4, 8, 12 and 16 workers. In
order to show the robustness of our model, five Monte Carlo
runs have been executed, obtaining as a result the average and
the standard deviation of those executions.
Fig. 7 shows the obtained speed-up in a graphical way.
Such speed-up has been calculated as T1/Tn, where T1is
the execution time of the slowest execution with one worker
node and Tnis the average time of the executions with n
worker nodes. Comparing the theoretical and real speed-up
values obtained, it can be observed that the model is able to
scale very well, reaching a speed-up value that is very close
to the theoretical one with 2, 4 and 8 workers. However, for
12 workers and beyond, we can see that the communication
times between the nodes hamper the speed-up due to the
insufficient amount of data, a fairly common behaviour in
Page 9 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 10
Fig. 5. False RGB color map of the big Indian Pines (BIP) scene, using the spectral bands 88, 111 and 150.
Fig. 6. False RGB color map of the Hyperion data set (HDS), using the spectral bands 35, 110 and 150.
cloud environments, in which the main bottleneck occurs in
the communication between nodes. As a result, it is important
to make sure that there exists an adequate balance between
the total amount of data to be processed and the number
of processing nodes. Table II tabulates the performance data
collected in this experiment, which comprises reconstruction
errors, computation times and speed-ups. As we can observe in
Table II, the reconstruction errors achieved by the AE network
are very similar for different numbers of workers (with slight
changes due to the random selection of samples), keeping a
continuous value as the speed-up increases and more nodes
are introduced into the cluster. Also, it is worth noting that the
standard deviations of the error are very low, demonstrating
that the proposed implementation remains highly robust in
all cases. These very low errors are finally reflected in Fig.
8, which shows three reconstructed signatures of different
materials in the BIP scene. As it can be seen in Fig. 8, the
reconstructed signatures are extremely similar to the original
ones, a fact that allows for their exploitation in advanced
processing tasks such as classification or spectral unmixing.
2) Experiment 2: Our second experiment explores the in-
ternal parallelization of each worker node (at the core level).
For this purpose, the cloud-distributed AE has been tested on
the HDS dataset, again reducing the spectral dimensionality to
60 principal components and randomly collecting 20%, 40%,
60% and 80% of training samples to create the training set,
and the remaining 80%, 60%, 40% and 20% to create the test
set. Moreover, 1 master node and 8 worker nodes (each one
with 6 virtual cores) have been considered to implement the
cloud environment.
Fig. 9(a) shows the results obtained in this experiment. If
we compare the theoretical speed-up values and the real ones
Page 10 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 11
TABLE II
REC ONS TRU CTI ON E RRO RS (OB TAIN ED A S THE MSE B ETW EE N THE O RI GIN AL TE ST S AMP LE S AND T HE O NES R EC ONS TRU CTE D BY T HE PR OPO SE D
CLOUD-DI STR IBU TE D AE), ALO NG W ITH T HE P ROC ESS ING T IM ES AN D SP EED UP S OBTA INE D FO R DIFF ER ENT N UM BER S OF WO RK ERS W HE N
PRO CES SI NG TH E BIP IMAGE.
Workers 1 2 4 8 12 16
Loss (MSE) 7.93e-5 7.92e-5 8.35e-5 9.51e-5 8.60e-5 8.35e-5
Time (s) 17398.74 8991.12 4518.39 2354.91 1803.27 1288.69
Speedup 1 1.9308 3.8506 7.3882 9.6484 13.5011
Fig. 7. Scalability of the proposed cloud-distributed network when processing
the BIP dataset with 1, 2, 4, 8, 12 and 16 worker nodes and 1 master node.
The red line indicates the theoretical speed-up value and the blue bars indicate
the actual values reached.
obtained, it can be seen that our implementation is able to
reach a speed-up that is almost identical to the theoretical one.
This is quite important, as the obtained results indicate that the
scalability achieved in each node is almost linear with regards
to the size of the HSI scenes considered in each node, thanks
to the cores available in each node. In this way, the proposed
cloud-distributed AE implementation takes full advantage of
all the available resources, both in parallel (multi-core) and
distributed fashion.
3) Experiment 3: Our last experiment evaluates the scal-
ability of the proposed cloud AE for HSI data compression
using a very large-sized data set. The HDS images have been
considered for this purpose. Due to the great amount of data,
this experiment has been split in two parts. The first part
performs a comparison over a cloud environment composed
by 1, 2, 4, 8, 12 and 16 worker nodes, and 1 master node,
employing the 20% and 40% of the samples to create the
training set, and the remaining 80% and 60% of data to create
the test set. However, due to the memory limitations of the
workers, the second part performs a comparison over a cloud
environment composed by 2, 4, 8, 12 and 16 worker nodes, and
1 master node, employing the 60% and 80% of the samples
to create the training set, and the remaining 40% and 20% of
data to create the test set. In this context, it must be noted
that while in the first part the speed-up is obtained based on
the implementation with 1 worker node, in the second one
the speed-up is obtained based on the implementation with 2
worker nodes.
Figs. 9(b) and 9(c) show the results obtained by the two
parts of this experiment in a graphical way. In this case,
it is interesting to observe that the theoretical speed-up and
the linear speed-up values do not coincide. When we talk
about linear speed-up, we normally refer to the expected
speed-up when linear partitioning is performed in the cluster.
However, in a real environment the partitioning is not always
linear. In fact, we can observe a performance gap in the 8-
node configuration. This can be explained by the relationship
between the total number of cores in the cluster, C(obtained
as the number of cores per node multiplied by the number of
nodes), and the number of existing data partitions, P, given
by Eq. (18):
(λ−1) ·C < P < λ ·C, (18)
where λis a scalar. For instance, when using 8-node con-
figuration, its value is set to λ= 2. Taking Eq. (18) into
consideration, and assuming that the cluster cores execute
tasks when they are free, the non-compliance of Eq. (18) leads
to the fact that some cores remain idle after finishing their
first allocated tasks, so the fine-grained parallelism is not fully
exploited in this case.
In the considered cluster, since each node has 6 cores, a total
of C= 6 ×Nworking cores can be exploited. Furthermore,
these Cworking cores allow for the processing of the data
partitions in batches of Ctasks at most. For instance, when
a configuration of 8 nodes is used, the cluster environment
is made up of a total of C= 6 ×8 = 48 working cores.
This indicates that, at most, in one processing batch Spark
will launch 48 tasks. As Spark splits the HSD data on 58
data partitions, 58 tasks must be executed over each partition.
However, in each batch only 48 tasks can be performed. This
means that two batches must be run; the first one with 48
tasks and the second one with only 10. As a result, the second
batch cannot fully exploit fine-grained parallelism as only 10
cores are being used, with 38 idle nodes. This results in an
unnecessary waste of computing resources.
However, when the idle cores from the second batch are
used, the performance improves. This is the case of the 12-
node configuration (C= 72), where the partitioning becomes
more efficient, complying with Eq. (18). Linear speed-up
based on workers needs to be added to this core-level speed-
up, leading to a new speed-up which is calculated as the
multiplication of those speed-ups, as indicated by Eq. (19):
Tw
1
Tw
n
·Tp
1
Tp
n
,(19)
where Tw
nis the processing time at the worker level and Tp
n
the processing time at the core level.
Page 11 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 12
Fig. 8. Comparison between the original (in blue) and reconstructed (in dotted red) spectral signatures extracted from the BIP scene by the proposed cloud
AE implementation using 8 workers.
(a) (b) (c)
Fig. 9. Scalability of the proposed cloud-distributed network when processing the HDS data set in experiments 2 and 3: a) with 8 worker nodes and 1 master
node, considering 20%, 40%, 60% and 80% of training data (experiment 2), b) with 1, 2, 4, 8, 12 and 16 worker nodes and 1 master node, considering 20%
and 40% of training data (experiment 3, first part), and c) with 2, 4, 8, 12 and 16 worker nodes and 1 master node, considering 60% and 80% of training data
(experiment 3, second part). The numbers in the parentheses indicate the total amount of data used in MB. The red lines indicates the theoretical speed-up
value (red-continuous line) and linear speed-up value (red-dotted line), while the blue and orange bars indicate the actual values reached.
With the aforementioned observations in mind, and focusing
on the results given by the first part of the experiment and
reported on Fig. 9(b) we can observe that, for each configura-
tion and training with 20% and 40% of the available samples,
the proposed AE exhibits quite similar speed-ups, with slight
variations due to the distribution of data and the role of idle
nodes. It is interesting to observe with 1-8 nodes how the
speed-up is quite similar to the theoretical one, while with
12-16 nodes the differences between obtained and theoretical
speed-up values are higher, indicating that the proposed AE
with only 20% and 40% of training samples does not take full
advantage of the cloud environment’s potential.
On the other part, Fig. 9(c) reports the obtained results
of the second part of this experiment. In this case, the base
implementation of the AE is conducted on a cloud environment
with 2 worker nodes, employing 60% and 80% of training
data. With 2 and 4 worker nodes the obtained speed-up values
are very similar employing 60% and 80% of the available
samples, while with 8, 12 and 16 nodes it is clear how the
version with more training data is able to achieve a superior
speed-up, reaching a value very similar to the theoretical one
with 16 nodes. This indicates that the amount of data handled
in this case is more convenient to take full advantage of the
way that Spark organizes data partitions and tasks in batches,
achieving better parallelization at the core level (fine-grained
parallelism) and also better distribution at the worker level
(coarse-grained parallelism). These conclusions are supported
by the data tabulated in Table III, where the speed-up employ-
ing 20% and 40% of training data has been obtained taking as
base times the cloud environment with 1 node, while for 60%
and 80% of training data the speed-up is obtained comparing
with the environment composed by 2 worker nodes due to
the exhausting use of memory. Once again, the MSE keeps
constant with different number of nodes, varying slightly with
the training percentage, which indicates that the network is
able to optimize very well, without overfitting the parameters
when 60% or 80% of the available training samples are used,
but also avoiding underfitting when few samples are used for
training purposes.
V. CON CLUSION AND FUTURE LINES
This paper presents a new cloud-based AE neural network
for remotely sensed HSI data analysis in distributed fashion.
This kind of artificial neural network finds non-linear solu-
tions when compressing the data, as opposed to traditional
techniques such as PCA. In this sense, the proposed approach
is more suitable for complex data sets such as HSIs. The
proposed AE implementation over a Spark cluster exhibits
great performance, not only in terms of data compression
and error reconstruction, but also in terms of scalability when
processing huge data volumes, which cannot be achieved by
traditional (sequential) AE implementations. Those sequential
algorithms may be a valid option when the data to be managed
and analyzed can be stored in a single machine with limited
Page 12 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 13
TABLE III
REC ONS TRU CTI ON E RRO RS (OB TAIN ED A S THE MSE B ETW EE N THE O RI GIN AL TE ST S AMP LE S AND T HE O NES R EC ONS TRU CTE D BY T HE PR OPO SE D
CLOUD-DI STR IBU TE D AE), ALO NG W ITH T HE P ROC ESS ING T IM ES AN D SP EED UP S OBTA INE D FO R DIFF ER ENT N UM BER S OF WO RK ERS W HE N
PRO CES SI NG TH E HDS DATA SET.
Number of workers
Training Percentage (size) 1 2 4 8 12 16
Loss (MSE)
20 (1838 MB) 6.09e-5 6.12e-05 6.01e-05 5.92e-05 6.47e-05 5.60e-05
40 (3676 MB) 6.56e-5 5.80e-05 6.30e-05 6.44e-05 6.67e-05 6.13e-05
60 (5515 MB) N/A 5.88e-05 6.49e-05 6.27e-05 6.44e-05 5.70e-05
80 (7353 MB) N/A 6.59e-05 6.29e-05 6.25e-05 6.30e-05 6.20e-05
Time (s)
20 (1838 MB) 14919.73 7632.64 4433.11 2952.27 1606.94 1171.87
40 (3676 MB) 30526.79 15709.24 9087.66 5721.97 3311.36 2182.60
60 (5515 MB) N/A 21505.27 12458.54 8456.84 4536.73 3122.92
80 (7353 MB) N/A 32645.44 18900.49 11633.75 6084.69 4103.02
Speedup
20 (1838 MB) 1 1.9547 3.3655 5.0536 9.2845 12.7315
40 (3676 MB) 1 1.9432 3.3591 5.2796 9.2187 13.9864
60 (5515 MB) N/A 1 1.7247 2.5191 4.7402 6.8862
80 (7353 MB) N/A 1 1.7268 2.8304 5.3651 7.9564
processing and memory resources. However, for large amounts
of HSI data, sequential implementations can easily run out of
memory or require a vast amount of computing time, which
cannot be assumed when reliable processing is needed in a
reasonable time. In this regard, both HPC and HTC alternatives
have provided new paths to solve those problems, includ-
ing parallelization on GPUs and distribution/parallelization
on clusters with cloud computing-based solutions. The ex-
periments carried out in this work demonstrate that cloud
versions of HSI data processing methods provide efficient
and effective HPC-HTC alternatives that successfully solve
the inherent problems of sequential versions by increasing
hardware capabilities in a cheaper way as compared with
other solutions such as grid computing. Also, the obtained
results reveal that the computation performance of cloud-
based solutions easily increases with larger data sets, taking
advantage of the computational load distribution when there
is a good balance between the amount of data and the cluster
complexity. Encouraged by the good results obtained in this
work, in the future we will develop other implementation of
HSI processing techniques in cloud computing environments.
Further work will also explore the design of more sophisticated
scheduling algorithms in order to circumvent the negative
impact introduced by idle processing cores in our current
implementation.
ACK NOW LE DG EM EN T
The authors would like to express their gratitude to the
Jetstream initiative, led by the Indiana University Pervasive
Technology Institute (PTI), for providing the cloud computing
environment and hardware resources used in this work.
REF ER EN CE S
[1] W. Emery and A. Camps, Chapter 2 - Basic Electromagnetic Concepts
and Applications to Optical Sensors. Elsevier, 2017, pp. 43–85.
[2] A. F. Goetz, G. Vane, J. E. Solomon, and B. N. Rock,
“Imaging Spectrometry for Earth Remote Sensing,” Science,
vol. 228, no. 4704, pp. 1147–1153, 1985. [Online]. Available:
http://science.sciencemag.org/content/228/4704/1147
[3] M. Teke, H. S. Deveci, O. Halilo˘
glu, S. Z¨
ubeyde G¨
urb¨
uz, and
U. Sakarya, “A Short Survey of Hyperspectral Remote Sensing Ap-
plications in Agriculture,” in Recent Advances in Space Technologies
(RAST), 2013.
[4] A. Plaza, J. Plaza, A. Paz, and S. Sanchez, “Parallel Hyperspectral Image
and Signal Processing,” IEEE Signal Processing Magazine, vol. 28,
no. 3, pp. 119–126, 2011.
[5] R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aronsson,
B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit, M. Solis,
M. R. Olah, and O. Williams, “Imaging spectroscopy and the Airborne
Visible/Infrared Imaging Spectrometer (AVIRIS),” Remote Sensing of
Environment, vol. 65, no. 3, pp. 227–248, 1998. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0034425798000649
[6] K. I. Iteen and et al., “APEX - the Hyperspectral ESA Airborne Prism
Experiment,” Sensors, 2008.
[7] C. D. A. Stephen K. Babey, “Compact Airborne Spectrographic Imager
(CASI): a Progress Review,” in Proc.SPIE, vol. 1937, 1993, pp. 1937
– 1937 – 12. [Online]. Available: https://doi.org/10.1117/12.157052
[8] S. G. Ungar, J. S. Pearlman, J. A. Mendenhall, and D. Reuter, “Overview
of the Earth Observing One (EO-1) mission,” IEEE Transactions on
Geoscience and Remote Sensing, 2003.
[9] M. J. Barnsley, J. J. Settle, M. A. Cutter, D. R. Lobb, and F. Teston,
“The PROBA/CHRIS Mission: a Low-Cost Smallsat for Hyperspectral
Multiangle Observations of the Earth Surface and Atmosphere,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp.
1512–1520, July 2004.
[10] L. Guanter and et al., “The EnMAP spaceborne imaging spectroscopy
mission for earth observation,” 2015.
[11] C. Galeazzi, A. Sacchetti, A. Cisbani, and G. Babini, “The PRISMA
Program,” in IGARSS 2008 - 2008 IEEE International Geoscience and
Remote Sensing Symposium, vol. 4, July 2008, pp. IV – 105–IV – 108.
[12] M. A. Wulder, J. G. Masek, W. B. Cohen, T. R. Loveland, and C. E.
Woodcock, “Opening the archive: How free data has enabled the science
and monitoring promise of Landsat,” Remote Sensing of Environment,
2012.
[13] J. Aschbacher, ESA’s Earth Observation Strategy and Copernicus.
Singapore: Springer Singapore, 2017, pp. 81–86. [Online]. Available:
https://doi.org/10.1007/978-981-10-3713-9 5
[14] Y. Ma, H. Wu, L. Wang, B. Huang, R. Ranjan, A. Zomaya, and W. Jie,
“Remote sensing big data computing: Challenges and opportunities,”
Future Generation Computer Systems, vol. 51, no. Supplement C, pp.
47 – 60, 2015.
[15] M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big
Data for Remote Sensing: Challenges and Opportunities,” Proceedings
of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016.
[16] A. Plaza, J. Plaza, and D. Valencia, “Impact of platform heterogeneity on
the design of parallel algorithms for morphological processing of high-
dimensional image data,” Journal of Supercomputing, vol. 40, no. 1, pp.
81–107, 2007.
[17] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone,
G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri,
M. Marconcini, J. C. Tilton, and G. Trianni, “Recent advances in
Page 13 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 14
techniques for hyperspectral image processing,” Remote Sensing of
Environment, vol. 113, no. 1, pp. S110–S122, 2009.
[18] A. Plaza, D. Valencia, J. Plaza, and P. Martinez, “Commodity cluster-
based parallel processing of hyperspectral imagery,” Journal of Parallel
and Distributed Computing, 2006.
[19] D. Gorgan, V. Bacu, T. Stefanut, D. Rodila, and D. Mihon, “Grid Based
Satellite Image Processing Platform for Earth Observation Application
Development,” in 2009 IEEE International Workshop on Intelligent
Data Acquisition and Advanced Computing Systems: Technology and
Applications, 2009, pp. 247–252.
[20] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud Computing and Grid
Computing 360-Degree Compared,” in 2008 Grid Computing Environ-
ments Workshop, Nov 2008, pp. 1–10.
[21] Z. Chen, N. Chen, C. Yang, and L. Di, “Cloud Computing Enabled
Web Processing Service for Earth Observation Data Processing,” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 2012.
[22] G. Hager and G. Wellein, Introduction to High Performance Computing
for Scientists and Engineers. CRC Press., 2010.
[23] K. Stanoevska-Slabeva, T. Wozniak, and S. Ristol, Grid and Cloud
Computing: A Business Perspective on Technology and Applications.
Springer, 2010.
[24] A. Fern´
andez, S. del R´
ıo, V. L´
opez, A. Bawakid, M. J. del Jesus, J. M.
Ben´
ıtez, and F. Herrera, “Big Data with Cloud Computing: An insight
on the computing environment, MapReduce, and programming frame-
works,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, vol. 4, 2014.
[25] Amazon Web Services, “Overview of Amazon Web Services,” Amazon
Web Services, 2017.
[26] Microsoft, “Microsoft Azure Cloud Computing Platform; Services,”
2017.
[27] C. Severance, Using Google App Engine: Start Building and Running
Web Apps on Google’s Infrastructure. O’Reilly, 2009.
[28] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on
large clusters,” Journal of Communication Systems, 2004.
[29] D. Borthakur, “The Hadoop Distributed File System : Architecture and
Design,” Access, 2007.
[30] Apache Spark, “Apache Spark - Unified Analytics Engine for Big Data,”
2018.
[31] J. M. Haut, M. Paoletti, J. Plaza, and A. Plaza, “Cloud implementation
of the k-means algorithm for hyperspectral image analysis,” The Journal
of Supercomputing, vol. 73, no. 1, pp. 514–529, 2017.
[32] V. A. A. Quirita, G. A. O. P. da Costa, P. N. Happ, R. Q. Feitosa,
R. da Silva Ferreira, D. A. B. Oliveira, and A. Plaza, “A new cloud
computing architecture for the classification of remote sensing data,”
IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 10, no. 2, pp. 409–416, 2017.
[33] Y. Zhang, Z. Wu, J. Sun, Y. Zhang, Y. Zhu, J. Liu, Q. Zang, and
A. Plaza, “A distributed parallel algorithm based on low-rank and sparse
representation for anomaly detection in hyperspectral images,” Sensors,
vol. 18, no. 11, 2018. [Online]. Available: http://www.mdpi.com/1424-
8220/18/11/3627
[34] J. Setoain, M. Prieto, C. Tenllado, and F. Tirado, “GPU for Parallel On-
Board Hyperspectral Image Processing,” International Journal of High
Performance Computing Applications, 2008.
[35] A. J. Plaza and C. I. Chang, High Performance Computing in Remote
Sensing Book Review Book Review. Boca Raton, Florida: Chapman &
Hall/CRC Press, Computer & Information Science Series, 2008.
[36] C. Gonz´
alez, S. S´
anchez, A. Paz, J. Resano, D. Mozos, and A. Plaza,
“Use of FPGA or GPU-based architectures for remotely sensed hyper-
spectral image processing,” Integration, the VLSI Journal, vol. 46, no. 2,
pp. 89 – 103, 2013.
[37] A. Plaza, P. Mart´
ınez, J. Plaza, and R. P´
erez, “Dimensionality reduc-
tion and classification of hyperspectral image data using sequences
of extended morphological transformations,” in IEEE Transactions on
Geoscience and Remote Sensing, 2005.
[38] D. Tuia, S. Lopez, M. Schaepman, and J. Chanussot, “Foreword to
the special issue on hyperspectral image and signal processing,” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 2015.
[39] Z. Wu, Y. Li, A. Plaza, J. Li, F. Xiao, and Z. Wei, “Parallel and
distributed dimensionality reduction of hyperspectral data on cloud
computing architectures,” IEEE Journal of Selected Topics in Applied
Earth Observations and Remote Sensing, vol. 9, no. 6, pp. 2270–2278,
June 2016.
[40] X. Jia, B. C. Kuo, and M. M. Crawford, “Feature mining for hyperspec-
tral image classification,” Proceedings of the IEEE, vol. 101, no. 3, pp.
676–697, March 2013.
[41] R. Bellman, Adaptive Control Processes: A Guided Tour, ser. Princeton
Legacy Library. Princeton University Press, 2015.
[42] G. Hughes, “On the mean accuracy of statistical pattern recognizers,”
IEEE Transactions on Information Theory, vol. 14, no. 1, pp. 55–63,
January 1968.
[43] T.-W. Lee, “Independent component analysis,” in Independent compo-
nent analysis. Springer, 1998, pp. 27–66.
[44] J. Wang and C.-I. Chang, “Independent component analysis-based
dimensionality reduction with applications in hyperspectral image anal-
ysis,” IEEE transactions on geoscience and remote sensing, vol. 44,
no. 6, pp. 1586–1600, 2006.
[45] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation
for ordering multispectral data in terms of image quality with implica-
tions for noise removal,” IEEE Trans. Geosci. Remote Sens., vol. 26,
no. 1, pp. 65–74, Jan. 1988.
[46] N. He, M. E. Paoletti, L. Fang, S. Li, A. Plaza, J. Plaza et al., “Feature
extraction with multiscale covariance maps for hyperspectral image
classification,” IEEE Transactions on Geoscience and Remote Sensing,
no. 99, pp. 1–15, 2018.
[47] S. Wold, K. Esbensen, and P. Geladi, “Principal Component Analysis,”
Chemometrics and Intelligent Laboratory System, vol. 2, no. 1, pp. 37–
52, 1987.
[48] D. Fernandez, C. Gonzalez, D. Mozos, and S. Lopez, “Fpga implemen-
tation of the principal component analysis algorithm for dimensionality
reduction of hyperspectral images,” Journal of Real-Time Image Pro-
cessing, pp. 1–12, 2016.
[49] E. Martel, R. Lazcano, J. L´
opez, D. Madro˜
nal, R. Salvador, S. L´
opez,
E. Juarez, R. Guerra, C. Sanz, and R. Sarmiento, “Implementation of the
principal component analysis onto high-performance computer facilities
for hyperspectral dimensionality reduction: Results and comparisons,”
Remote Sensing, vol. 10, no. 6, p. 864, 2018.
[50] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep Learning-Based
Classification of Hyperspectral Data,” IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp.
2094–2107, 2014.
[51] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted
Boltzmann Machines,” in Proceedings of the 27th International Con-
ference on Machine Learning (ICML-10), Johannes F¨
urnkranz and
Thorsten Joachims, Ed. Omnipress, 2010, pp. 807–814.
[52] G. E. Hinton and R. S. Zemel, “Autoencoders, Minimum Description
Length and Helmholtz Free Energy,” in Proceedings of the 6th
International Conference on Neural Information Processing Systems.
Denver, Colorado: Morgan Kaufmann Publishers Inc., 1993, pp. 3–10.
[Online]. Available: http://dl.acm.org/citation.cfm?id=2987189.2987190
[53] M. S. Apostolopoulou, D. G. Sotiropoulos, I. E. Livieris, and P. Pintelas,
“A memoryless bfgs neural network training algorithm,” in Industrial
Informatics, 2009. INDIN 2009. 7th IEEE International Conference on.
IEEE, 2009, pp. 216–221.
[54] N. M. Nawi, M. R. Ransing, and R. S. Ransing, “An improved learning
algorithm based on the broyden-fletcher-goldfarb-shanno (bfgs) method
for back propagation neural networks,” in Sixth International Conference
on Intelligent Systems Design and Applications, vol. 1. IEEE, 2006,
pp. 152–157.
[55] R. Fletcher and M. J. Powell, “A rapidly convergent descent method for
minimization,” The computer journal, vol. 6, no. 2, pp. 163–168, 1963.
[56] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng,
“On optimization methods for deep learning,” in Proceedings of the
28th International Conference on International Conference on Machine
Learning. Omnipress, 2011, pp. 265–272.
[57] J. Pearlman, S. Carman, C. Segal, P. Jarecke, P. Clancy, and W. Browne,
“Overview of the hyperion imaging spectrometer for the nasa eo-1
mission,” in IGARSS 2001. Scanning the Present and Resolving the
Future. Proceedings. IEEE 2001 International Geoscience and Remote
Sensing Symposium (Cat. No.01CH37217), vol. 7, July 2001, pp. 3036–
3038 vol.7.
Page 14 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60