ArticlePDF Available

Abstract and Figures

Advances in remote sensing hardware have led to a significantly increased capability for high-quality data acquisition, which allows the collection of remotely sensed images with very high spatial, spectral, and radiometric resolution. This trend calls for the development of new techniques to enhance the way that such unprecedented volumes of data are stored, processed, and analyzed. An important approach to deal with massive volumes of information is data compression, related to how data are compressed before their storage or transmission. For instance, hyperspectral images (HSIs) are characterized by hundreds of spectral bands. In this sense, high-performance computing (HPC) and high-throughput computing (HTC) offer interesting alternatives. Particularly, distributed solutions based on cloud computing can manage and store huge amounts of data in fault-tolerant environments, by interconnecting distributed computing nodes so that no specialized hardware is needed. This strategy greatly reduces the processing costs, making the processing of high volumes of remotely sensed data a natural and even cheap solution. In this paper, we present a new cloud-based technique for spectral analysis and compression of HSIs. Specifically, we develop a cloud implementation of a popular deep neural network for non-linear data compression, known as autoencoder (AE). Apache Spark serves as the backbone of our cloud computing environment by connecting the available processing nodes using a master-slave architecture. Our newly developed approach has been tested using two widely available HSI data sets. Experimental results indicate that cloud computing architectures offer an adequate solution for managing big remotely sensed data sets.
Content may be subject to copyright.
Cloud Deep Networks for Hyperspectral Image Analysis
Journal:
Transactions on Geoscience and Remote Sensing
Manuscript ID
TGRS-2019-00084
Manuscript Type:
Regular paper
Date Submitted by the
Author:
15-Jan-2019
Complete List of Authors:
Haut, Juan M.; Universidad de Extremadura, Tecnología de los
Computadores y las Comunicaciones
Gallardo Jaramago, Jose Antonio; Hypercomp, Department of
Technology of Com-puters and Communications
Paoletti, Mercedes Eugenia; University of Extremadura, Department of
Technology of Computers and Communications
Cavallaro, Gabriele; Forschungszentrum Julich, Jülich Supercomputing
Centre
Plaza, Javier; University of Extremadura, Computer Science Dept.;
University of Extremadura, Computer Science Department
Plaza, Antonio; University of Extremadura, Technology of Computers and
Communications
Riedel, Morris; Juelich Research Center, Federated Systems and Data
Keywords:
Hyperspectral Data
Transactions on Geoscience and Remote Sensing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1
Cloud Deep Networks for Hyperspectral Image
Analysis
Juan M. Haut, Student Member, IEEE, Jose A. Gallardo, Mercedes E. Paoletti, Student Member, IEEE,
Gabriele Cavallaro, Member, IEEE, Javier Plaza, Senior Member, IEEE, Antonio Plaza, Fellow, IEEE,
and Morris Riedel, Member, IEEE
Abstract—Advances in remote sensing hardware have led to a
significantly increased capability for high quality data acquisition,
which allows the collection of remotely sensed images with very
high spatial, spectral and radiometric resolution. This trend
calls for the development of new techniques to enhance the way
such unprecedented volumes of data are stored, processed and
analyzed. An important approach to deal with massive volumes
of information is data compression, related to how data are
compressed before their storage or transmission. For instance,
hyperspectral images (HSIs) are characterized by hundreds of
spectral bands. In this sense, high performance (HPC) and
high throughput (HTC) computing offer interesting alternatives.
Particularly, distributed solutions based on cloud computing
can manage and store huge amounts of data in fault-tolerant
environments, by interconnecting distributed computing nodes
so that no specialized hardware is needed. This strategy greatly
reduces the processing costs, making the processing of high
volumes of remotely sensed data a natural and even cheap
solution. In this paper, we present a new cloud-based technique
for spectral analysis and compression of HSIs. Specifically, we
develop a cloud implementation of a popular deep neural network
for non-linear data compression, known as auto-encoder (AE).
Apache Spark serves as the backbone of our cloud computing
environment by connecting the available processing nodes using
a master-slave architecture. Our newly developed approach has
been tested using two widely available HSI datasets. Experimental
results indicate that cloud computing architectures offer an
adequate solution for managing big remotely sensed data sets.
Index Terms—High performance computing (HPC), high
throughput computing (HTC), cloud computing, hyperspectral
images (HSIs), Auto-encoder (AE), dimensionality reduction
(DR), speed-up.
This paper was supported by Ministerio de Educaci´
on (Resoluci´
on de 26
de diciembre de 2014 y de 19 de noviembre de 2015, de la Secretar´
ıa de
Estado de Educaci´
on, Formaci´
on Profesional y Universidades, por la que
se convocan ayudas para la formaci´
on de profesorado universitario, de los
subprogramas de Formaci´
on y de Movilidad incluidos en el Programa Estatal
de Promoci´
on del Talento y su Empleabilidad, en el marco del Plan Estatal de
Investigaci´
on Cient´
ıfica y T´
ecnica y de Innovaci´
on 2013-2016. This work has
also been supported by Junta de Extremadura (decreto 14/2018, ayudas para
la realizaci´
on de actividades de investigaci´
on y desarrollo tecnol´
ogico, de di-
vulgaci´
on y de transferencia de conocimiento por los Grupos de Investigaci´
on
de Extremadura, Ref. GR18060) and by MINECO project TIN2015-63646-
C5-5-R. (Corresponding author: Juan M. Haut)
J. M. Haut, J. A. Gallardo, M. E. Paoletti, J. Plaza and A. Plaza are with
the Hyperspectral Computing Laboratory, Department of Technology of Com-
puters and Communications, Escuela Polit´
ecnica, University of Extremadura,
10003 C´
aceres, Spain.(e-mail: juanmariohaut@unex.es; mpaoletti@unex.es;
jplaza@unex.es; aplaza@unex.es).
G. Cavallaro is with the J¨
ulich Supercomputing Center, Wilhelm-Johnen-
Straße 52428 J¨
ulich, Germany (e-mail:g.cavallaro@fz-juelich.de)
M. Riedel is with the J¨
ulich Supercomputing Center, Wilhelm-Johnen-
Straße 52428 J¨
ulich, Germany, and with the University of Iceland, 107
Reykjavik, Iceland (e-mail: m.riedel@fz-juelich.de)
I. INT ROD UC TI ON
EARTH Observation (EO) has evolved dramatically in the
last decades due to the technological advances incor-
porated into remote sensing instruments in the optical and
microwave domains [1]. With their hundreds of contiguous and
narrow channels within the visible, near-infrared and short-
wave infrared spectral ranges, hyperspectral images (HSIs)
have been used for the retrieval of bio-, geo-chemical and
physical parameters that characterize the surface of the earth.
These data are now used in a wide-range of applications, aimed
at monitoring and implementing new policies in the domain of
agriculture, geology, assessment of environmental resources,
urban planning, military/defense, disaster management, etc.
[2], [3], [4].
Most of the developments carried out over the last decades
in the field of imaging spectroscopy have been achieved
via spectrometers on board airborne platforms. For instance,
the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)
[5], designed and operated by the NASAs Jet Propulsion
Laboratory (JPL), was the first full spectral range imaging
spectrometer. It has been dedicated to remote sensing of the
earth in a large number of experiments and field campaigns
since the late 1980s. Other examples of airborne missions
include the European Space Agency (ESA)’s Airborne Prism
Experiment (APEX) (2011-2016) [6], or the Compact Air-
borne Spectrographic Imager (CASI) [7] (1989-today), among
many others.
The vast amount of data collected by airborne platforms
has paved the way for EO satellite hyperspectral missions.
The Hyperion instrument on-board NASAs Earth Observing
One (EO-1) spacecraft (2000-2017) [8] and the Compact
High Resolution Imaging Spectrometer (CHRIS) on ESA’s
Proba-1 microsatellite [9] (2001-today) have been to of the
main sources of space-based HSI data in the last decades.
Currently, there are several HSI missions under development,
including the Environmental Mapping and Analysis Program
(EnMAP) [10], the Prototype Research Instruments and Space
Mission technology Advancement (PRISMA) [11], among
others. Their main objective is to fill the current gap in
space-based imaging spectroscopy data and achieve better
radiometric performance than the precursor missions.
The adoption of an open and free data policy by the National
Aeronautics and Space Administration (NASA) [12] and, more
recently, by ESAs Copernicus initiative (the largest single
EO programme) [13] is now producing an unprecedented
Page 1 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2
amount of data to the research community. For instance, in
2017 the Sentinel Data Access System provided an estimated
10.04 TB/day with an average download volume of 93.5
TB/day1. Even though the Copernicus space component (i.e.,
the Sentinels) have not included a hyperpectral instrument
yet (Sentinel-10 is a HSI mission expected to be operational
around 2025-2030), it has been shown that the vast amount
of open data currently available calls for re-definition of
the challenges within the entire HSI life cycle (i.e., data
acquisition, processing and application phases). It is not by
coincidence that remote sensing data are now described under
the big data terminology, with characteristics such as volume
(increasing scale of acquired/archived data), velocity (rapidly
growing data generation rate, real-time processing needs),
variety (data acquired from multiple sources), veracity (data
uncertainty/accuracy), and value (extracted information) [14],
[15].
In this context, traditional processing methods such as
desktop approaches (i.e., MATLAB, R, SAS, ENVI, etc.) offer
limited capabilities when dealing with such large amounts of
data, especially regarding the velocity component (i.e., the
demand for real-time applications). Despite modern desktop
computers and laptops are becoming increasingly more pow-
erful, with multi-core and many-core capabilities including
graphics processing units (GPUs), the limitations in terms of
memory and core availability currently limit the processing of
large HSI data archives. Therefore, the use of highly scalable
parallel processing approaches is a mandatory solution to
improve the access to and the analysis of such great amount of
complex data, in order to provide decision-makers with clear,
timely, and useful information [16], [17].
Many changes have been introduced to parallel and dis-
tributed architectures over the past 30 years. In particular,
research has been focused on how to leverage many-core
architectures (e.g., GPUs) to deal with the growing demand
of domain-specific applications for handling computationally
intense problems. Other parallel architectures such as clusters
[18], grids [19], or clouds [20], [21] have also been widely
exploited for remotely sensed data processing, since they pro-
vide tremendous storage/computation capacity and outstanding
scalability. Parallel and distributed computing approaches can
be categorized into high performance computing (HPC) or
high throughput computing (HTC) solutions. Contrary to an
HPC system [22] (generally, a supercomputer that includes
a massive number of processors connected through a fast
dedicated network), an HTC system is more focused on the
execution of independent and sequential jobs that can be
individually scheduled on many different computing resources,
regardless of how fast an individual job can be completed.
A classic example of an HPC system is a cluster, while a
typical example of an HTC system is a grid. Cloud com-
puting is the natural evolution of grid computing, adopting
its backbone and infrastructure [21] but delivering computing
resources as a service over the network connection [23]. In
other words, the cloud moves desktop and laptop computing
1https://sentinel.esa.int/web/sentinel/news/-/article/sentinel-data-access-
annual-report-2017
(via the Internet) to a service-oriented platform using large
remote server clusters, and massive storage to data centres. In
this scenario, computing relies on sharing a pool of physical
and/or virtual resources, rather than on deploying local or
personal hardware and software. The process of virtualization
has enabled the cost-effectiveness and simplicity of cloud
computing solutions [24] (i.e., it exempts users from the
need to purchase and maintain complex computing hardware)
such as IaaS (infrastructure as a service), PaaS (platform as
a service), or SaaS (software as a service). Several cloud
computing resources are currently available commercially, on
apay as you go model from providers such as Amazon Web
Services (AWS) [25], Microsoft Azure [26], and Google’s
Compute Engine [27].
Cloud computing infrastructures can rely on several comput-
ing frameworks that support the processing of large data sets
in a distributed environment. For example, the MapReduce
model [28] is the basis of a large number of open-source im-
plementations. The most popular ones are Apache Hadoop [29]
and its variant, Apache Spark [30] (an in-memory computing
framework). Despite the recent advances in cloud computing
technology, not enough efforts have been devoted to exploiting
cloud computing infrastructures for the processing of HSI data.
However, cloud computing offers a natural solution for the
processing of large HSI databases, as well as an evolution of
previously developed techniques for other kinds of computing
platforms, mainly due to the capacity of cloud computing to
provide internet-scale, service-oriented computing [31], [32],
[33].
In this work, we focus on the problem of how to develop
scalable data analysis and compression techniques [34], [35],
[4], [36] with the goal of facilitating the management of
remotely sensed HSI data. Dimensionality reduction (DR) of
HSIs is a fundamental pre-processing step that is applied
before many data transfer, store and processing operations. On
the one hand, when HSI data are efficiently compressed, they
can be handled more efficiently on-board satellite platforms
with limited storage and downlink bandwidth. On the other
hand, since HSI data lives primarily in a subspace [37], a
few informative features can be extracted from the hundreds
of highly correlated spectral bands that comprise HSI data
[38] without significantly affecting the data quality (lossy
compression of HSIs can still retain informative data for
subsequent processing steps).
Specifically, this paper develops a new cloud implemen-
tation of HSI data compression. As in [39], we adopt the
Hadoop distributed file system (HDFS) and Apache Spark
as well as a map-reduce methodology [24] to carry out our
implementation. However, we address the the DR problem
using a non-linear deep auto-encoder (AE) neural network
instead of the standard linear principal component analysis
(PCA) algorithm. The performance of our newly proposed
cloud-based AE is validated using two widely available and
known HSI data sets. Our experimental results show that
the proposed implementation can effectively exploit cloud
computing technology to efficiently perform non-linear com-
pression of large HSI data sets, while accelerating significantly
the processing time in a distributed environment.
Page 2 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 3
The remainder of the paper is organized as follows. Sec-
tion II provides an overview of the theoretical and opera-
tional details of the considered AE neural network for HSI
data compression, and the considered optimization method.
Section III presents our cloud-distributed AE network for
HSI data compression, describing the details of the network
configuration and the distributed implementation. Section IV
evaluates the performance of the proposed approach using
two widely available HSI data sets, taking into account the
quality of the compression and signal reconstruction, and also
the computational efficiency of the implementation in a real
cloud environment. Finally, section V concludes the work,
summarizing the obtained results and suggesting some future
research directions.
II. BACKGROU ND
HSI data are characterized by their intrinsically complex
spectral characteristics, where samples of the same class
exhibit high variability due to data acquisition factors or
atmospheric and lighting interferers. DR and feature extraction
(FE) methods are fundamental tools for the extraction of dis-
criminative features that reduce the intra-class variability and
inter-class similarity [40] present in HSI data sets. Futhermore,
by reducing the high spectral dimensionality of HSIs, these
methods are able to alleviate the curse of dimensionality [41],
which makes HSI data difficult to interpret by supervised
classifiers due to the Hughes phenomenon [42].
Several methods have been developed to perform DR and
FE from HSIs, such as the independent component analysis
(ICA) [43], [44] or the maximum noise fraction (MNF) [45],
[46], being PCA [47], [48], [49] one of the most widely used
methods for FE purposes. This unsupervised, linear algorithm
reduces the original high-dimensional and correlated feature
space to a lower-dimensional space of uncorrelated factors
(also called principal components or PCs) by applying an
orthogonal transformation through a projection matrix, which
makes it a simple yet efficient algorithm. However, PCA is re-
stricted to a linear map-projection and is not able to learn non-
linear transformations. In this context, auto-associative neural
networks such as AEs [50] offer a more flexible architecture
for FE and DR purposes, managing the non-linearities of the
data through an architecture made up of stacked layers and
non-linear activation functions (called stacked AE or SAE)
that can provide more detailed data representations from the
original input image (one per layer), which can be reused by
other HSI processing methods.
A. Auto-encoder (AE) Neural Network
Let us consider a HSI data cube XRn1×n2×nbands ,
where n1×n2are the spatial dimensions and nbands is the
number of spectral bands. Xis traditionally observed by
pixel-based algorithms as a collection of n1×n2spectral
samples, where each xiRnbands = [xi,1, xi,2,· · · , xi,nbands ]
contains the spectral signature of the observed surface ma-
terial. In this sense, the goal of DR methods is to obtain,
for each xi, a vector ciRnnew that captures the most
representative information of xiin a lower feature-space, being
Fig. 1. Graphic representation of a traditional auto-encoder for spectral
compression and restoration of hyperspectral images.
nnew << nbands. To achieve this goal, the SAE applies
an unsupervised symmetrical deep neural network to encode
the data in a lower-dimensional latent space, performing a
traditional embedding, and then decoding it to the original
space through a reconstruction stage. In fact, the SAE can
be interpreted as a mirrored net, where three main parts
can be identified (as shown in Fig. 1): i) the encoder or
mapping layers, ii) the middle or bottleneck layer, and iii)
the decoder or demapping layers. Based on the traditional
multilayer perceptron (MLP), the l-th layer defined in the SAE
performs an affine transformation between the input data x(l)
i
and its set of weights W(l)and biases b(l), as Eq. (1) indicates:
x(l+1)
i=Hx(l)
i·W(l)+b(l),(1)
where x(l+1)
iRn(l)is an abstract representation (or feature
representation) of the original input data xiin the feature
space obtained by the n(l)neurons that compose the l-th
layer, where the output of the k-th neuron is obtained as the
dot product between the n(l1) outputs of the previous layer
and its weights, passed through an activation function that is
usually implemented by the Rectified Linear Unit (ReLU) [51],
i.e. H(x) = max(0, x). Finally, the k-th feature in x(l+1)
ican
be obtained as:
x(l+1)
i,k =H
n(l1)
X
j=1 x(l)
i,j ·w(l)
k,j +b(l)
.(2)
With this in mind, the SAE applies two main processing
steps to each input sample xi. The first one, known as coding
stage, performs the embedding of the data, mapping it from
Rnbands space to Rnnew latent space. That is, the nencoder
layers of the encoder map their input data to a projected
representation following Eqs. (1) and (2), until reaching the
bottleneck layer. As a result, the bottleneck layer contains
the projection of each xiRnbands in its latent space,
defined by its nnew neurons, ciRnnew . As a result,
Page 3 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 4
the SAE allows to generate compressed (nnew < nbands),
extended (nnew > nbands) or even equally (nnew =nbands)
dimensional representations, depending on the final dimension
of the code vector ci.
The second stage performs the opposite operation, i.e. the
decoding, where the network tries to recover the original
information, obtaining an approximate reconstruction of the
original input vector [52]. In this case, the ndecoder layers of
the decoder demap the code vector ciuntil reaching the output
layer, where a reconstructed sample x0
iis obtained. Eq. (3)
gives an overview of the encoding-decoding process followed
by the SAE:
ciFor lin nencoder:x(l+1)
i=Hx(l)
i·W(l)+b(l)
x0
iFor ll in ndecoder:c(ll+1)
i=Hc(ll)
i·W(ll)+b(ll)
(3)
In order to obtain a lower-dimensional (but more discrimi-
native) representation of the input data, the network parameters
are iteratively adjusted in unsupervised fashion, where the
optimizer minimizes the reconstruction error between the input
data at the encoding stage, xi, and its reconstruction at the end
of the decoding stage, x0
i. This error function, given by Eq.
(4), is usually implemented in the form of a mean squared
error (MSE):
E(X) = min kXX0k2= min
n1·n2
X
i=1
kxix0
ik2.(4)
B. Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm
After describing the operational procedure of SAEs, it is
now important to observe the network optimization process.
As any artificial neural network with back-propagation, the
optimizer tries to find the set of parameters (synaptic weights
and biases) that, for a given network architecture, minimize
the error function E(X)defined by Eq. (4). This function
evaluates how well the neural network fits the dataset X, and
depends on the adaptative and learnable parameters of the
network, that can be denoted as W, so E(X,W). As E(X,W)
is non-linear, its optimization must be carried out iteratively,
reducing its value until an adequate stopping criterion is
reached. In this sense, standard optimizers back-propagate the
error signal through the network architecture calculating, for
each learnable parameter, the gradient of the error, i.e. the
direction and displacement that the parameter must undergo
in order to minimize the final error (also interpreted as the
importance of that parameter when obtaining the final error).
Mathematically, the updating of Win the t-th epoch can be
calculated by Eq. (5):
Wt+1 =Wt+ ∆W,
being W=µt·pt,(5)
where µtand ptare the learning rate (a positive scalar) and
the descent search direction, respectively [53]. The main goal
of any optimizer is to obtain the correct ptin order to descend
properly in the error function until the minimum is reached.
As opposed to standard optimizers, traditional Newton-
based methods determine the descent direction ptusing the
second derivative information contained into the Hessian ma-
trix, rather than just the gradient information, thus stabilizing
the process:
Ht·pt=−∇E(X,Wt)
pt=H1
t· ∇E(X,Wt)
Wt+1 =Wtµt·H1
t· ∇E(X,Wt),
(6)
where E(X,Wt)is the gradient of the error function eval-
uated with the network’s parameters at the t-th epoch, Wt,
and Htand H1
tare respectively the Hessian matrix and its
inverse, obtained at the t-th epoch. However, these methods
obtain the Hessian matrix and its inverse at each epoch,
which is quite expensive to compute, requiring a large amount
of memory. Instead of that, the Broyden-Fletcher-Goldfarb-
Shanno (BFGS) method [54] performs an estimation of how
the Hessian matrix has changed in each epoch, obtaining an
approximation (instead of the full matrix) that is improved
every epoch. In fact, as any algorithm of the family of
multivariate minimization quasi-newton methods, the BFGS
algorithm modifies the last expression of Eq. (6) as follows:
Wt+1 =Wtµt·Gt· ∇E(X,Wt),(7)
where Gtis the inverse Hessian approximation matrix (usu-
ally, when t= 0 the initial approximation matrix is the identity
matrix, G0=I). This Gtis updated at each epoch by means
of an update matrix:
Gt+1 =Gt+Ut.(8)
However, such update needs to comply with the quasi-
Newton condition, which is described below. Assuming that
E(X,W)is continuous for Wtand Wt+1 (with gradients
gt=E(X,Wt)and gt+1 =E(X,Wt+1), respectively)
and the Hessian His constant, then Eq. (9) is satisfied:
qtgt+1 gtand pt≡ Wt+1 − Wt
Secant condition on the Hessian: qt=H·pt
Secant condition on the inverse: H1·qt=pt
(9)
Since G=H1, the last expression in Eq. (9) can be
modified to G·qt=pt, so the approximation matrix G
can be obtained (at each epoch t) as a combination of the
linearly independent directions and their respective gradients.
Following the Davidon, Fletcher and Powell (DFP) rank 2
formula [55], Gcan be updated using Eq. (10):
Gt+1 =Gt+pt·p>
t
p>
t·qt
Gt·qt·q>
t
q>
t·Gt·qt
·Gt.(10)
Finally, the BFGS method updates its approximation matrix
by computing the complementary formula of the DFP method,
changing Gby Hand ptby qt, so Eq. (10) is finally modified
as follows:
Ht+1 =Ht+qt·q>
t
q>
t·pt
Ht·pt·p>
t
p>
t·Ht·pt
·Ht.(11)
Page 4 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 5
As the BFGS method intends to compute the inverse of Hand
G=H1, it inverts Eq. (11) to analytically obtain the final
update of the approximation matrix:
Gt+1 =Gt+1 + q>
t·Gt·qt
q>
t·pt·pt·p>
t
p>
t·qt
pt·q>
t·Gt+Gt·qt·p>
t
q>
t·pt
(12)
Algorithm 1 Broyden-Fletcher-Goldfarb-Shanno Algorithm
1: procedure BFGS(Wt: current parameters of the neural
network, E(X,W): Error function, Gt: current approxi-
mation to the Hessian)
2: gt=E(X,Wt)
3: pt=Gt·gt
4: Wt+1 =Wt+µt·pt µtby linear search
5: gt+1 =E(X,Wt+1)
6: qt=gt+1 gt
7: pt=Wt+1 − Wt
8: A=1 + q>
t·Gt·qt
q>
t·pt·pt·p>
t
p>
t·qt
9: B=pt·q>
t·Gt+Gt·qt·p>
t
q>
t·pt
10: Gt+1 =Gt+AB
return Wt+1,Gt+1
11: end procedure
Algorithm 1 provides a general overview of how the BFGS
method works in one epoch. A weakness of BFGS is that it
requires the computation of the gradient on the full dataset,
consuming a large amount of memory to properly run the
optimization. Taking into account the dimensionality of HSIs,
we can conclude that this method is not able to scale with the
number of samples [56]. In order to overcome this limitation,
and with the aim of speeding up the computation of both
the forward (affine transformations) and backward (optimizer)
steps of the AE for DR of HSIs, in the following section we
develop a distributed solution for cloud computing environ-
ments.
III. PROP OS ED IM PL EM EN TATIO N
A. Distributed Framework
We have developed a completely new distributed AE for HSI
data analysis2. In this context, two problems have been specif-
ically addressed in this work: i) the computing engine, and ii)
the distributed programming model over the cloud architecture.
Regarding the first problem, our distributed implementation
of the network model is run on top of a standalone Spark
cluster, due to its capacity to provide fast processing of
large data volumes on distributed platforms, in addition to
fault-tolerance. Furthermore, the Spark cluster is characterized
by a master-slave architecture, which makes it very flexible.
Specifically, a Spark cluster is formed by a master node, which
manages how the resources are used and distributed in the
cluster by hosting a Java virtual machine (JVM) driver, and the
scheduler, which distributes the tasks between the execution
2Code avaiable on: https://github.com/jgallardst/cloud-nn-hsi
nodes, and Nworker nodes (which can be more than one
per node) that execute the program tasks by creating a Java
distributed agent, called executor (where tasks are computed),
and store the data partitions (see Fig. 2).
Fig. 2. Graphic representation of a generic Spark cluster, which is composed
by one client node and Nworker nodes, where in each node several executor
Java virtual machines are running in parallel over several data partitions.
In relation to the second point, the adopted programming
model to perform the implementation of the distributed AE
is based on organizing the original HSI data in tuples or
key/value pairs, in order to apply the MapReduce model [39],
which divides the data processing task into two distributed
operations: i) mapping, which processes a set of data-tuples,
generating intermediate key-value pairs, and ii) reduction,
which gathers all the intermediate pairs obtained by the
mapping to generate the final result. In order to achieve
this behavior, data information in Spark is abstracted and
encapsulated into a fault-tolerant data structure called Resilient
Distributed Dataset (RDD). These RDDs are organized as
distributed collections of data, which are scattered by Spark
across the worker nodes when they are needed on the succes-
sive computations, being persisted on the memory of the nodes
or on the disk. This architecture allows for the parallelization
of the executions, achieved by performing MapReduce tasks
over the RDDs on the nodes. Moreover, two basic operations
can be performed on a RDD: i) the so-called transformations,
which are based on applying an operation to every row on
a partition, resulting in another RDD, and ii) actions, which
retrieve a value or a set of values that can be both RDD data or
the result of an operation where some RDD data are involved.
Operations are queued until an action is called; the needed
transformations are placed into a dependency graph, where
each node is a job stage, following a lazy execution paradigm.
This means that operations are not performed until they are
really needed, avoiding the repetition of a single operation
more than once.
In order to enable a simple and easy mechanism for man-
aging large data sets, the Spark environment provides another
Page 5 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 6
level of abstraction that uses the concept of Dataframe. These
Dataframes allow data to be organized on named columns,
being easier to manipulate (as in relational tables, columns
can be accessed by the column name instead of by the index).
With this in mind, the Spark standalone cluster functionality
can be summarized as follows:
1) The master node (also caled driver node) creates and
manages the Spark driver (see Fig. 2), a Java process
that contains the SparkContext of the application.
2) The driver context performs the data partitioning and
parallelization between the worker nodes, assigning to
each one a number of partitions, which depends on two
main aspects: the block size and the way the data are
stacked. Also, the driver creates the executors on the
worker nodes, which store data partitions on the worker
node and perform tasks on them.
3) When an action is called, a job is launched and the
master coordinates how the associated tasks are dis-
tributed into the different executors. In order to reduce
the data exchanging time, the Spark driver attempts to
perform “smart” task allocations, so that there are more
possibilities to assign a task to the executor, located in
the worker where the data partition used by the task to
perform the operation has been allocated.
4) When all the tasks on a given stage are finished, the
Scheduler allocates another stage of the job (if it was
a transformation), or retrieves the final output (if it was
an action).
Algorithm 2 shows a general overview of how our algorithm
is pipelined in the considered Spark cluster.
Algorithm 2 Iterative Process
1: procedure SPARK FLOW
2: P artitionedData Spark.parallelizeData()
3: t0
4: while t< niterations do
5: broadcastOutputData().
6: foreach partition P artitionedData do
7: P artitionedData.apply T ask().
8: end for
9: retrieveOutputData().
10: tt+1
11: end while
12: end procedure
B. Cloud Implementation
This section describes in detail the full distributed training
process, from the parallelization of HSI data across nodes to
the intrinsic logic of each training step, explaining the benefits
of our distributed training algorithm. Fig. 3 gives a general
overview of the full data pipeline developed in this work.
In the beginning, the original 3-dimensional HSI data cube
XRn1×n2×nbands , where n1×n2are the spatial dimensions
(height and width) and nbands is the spectral dimension given
by the number of spectral channels, is reshaped into a HSI
matrix XRnpixels×nbands , where npixels =n1×n2, i.e.
each row collects a full spectral pixel, being each column the
Fig. 3. Data pipeline of our distributed auto-encoder, where the input HSI
cube is first reshaped into a matrix and then split into several partitions
allocated into the Spark worker nodes, composed by several rows where each
one contains BS spectral pixels. These data partitions are then scaled and
duplicated in order to obtain the input network data and the corresponding
output network data. The AE is then executed and, for each iteration t, the
gradients are collected by the Spark driver, which calculates the final gradient
and performs the optimization with the L-BFGS algorithm. The updated
weights are finally broadcasted to each neural model contained in the cluster.
corresponding value in the spectral band. This matrix Xis
read by the Spark Driver, which collects the original HSI data
and partitions it into Psmaller subsets that are delivered to
the worker nodes in parallel. These workers store the obtained
partitions on their local disks. In this sense, each data partition
composes a RDD.
It must be noted that complex neural network topologies
derive on greedy RAM memory usage on the driver node.
Since Spark transformations apply an operation to every row
of the RDD, the fewer the number of rows, the fewer the
number of operations that must be carried out. In order to
improve the computation of the distributed model, a blocksize
(BS) hyperparameter is provided, with the aim of indicating
how many pixels should be stacked into a single row in order
to compute them together. With this observation in mind, the
p-th data partition (with p= 1,· · · , P ) can be seen as a 2-
dimensional matrix (p)DRnrows ×(BS·nbands )composed by
nrows rows, where each one stores BS concatenated spectral
Page 6 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 7
pixels, i.e. (p)djR(BS·nbands )= [xi,xi+1,· · · ,xi+B S ]. In
the end, each data partition (p)Dstores BS ·nrow s pixels.
The resulting partitions are then distributed across the worker
nodes. Such distribution allows the executors, located in each
worker node, to apply the subsequent tasks to those partitions
that each worker receives.
After distributing the data into RDDs, a distributed data
analysis process begins prior to the application of neural
network-based processing. In the first step, the data contained
in each partition (p)Dare scaled in a distributed way, taking
advantage of the cloud architecture and the available paral-
lelization of resources. In this sense, each partition’s row (p)dj
(and, internally, each pixel contained within) is transformed
based on the global maximum and minimum features (xmax
and xmin) of the whole image X, and the column local
maximum and minimum features ((p)dmax and (p)dmin), of
the p-th partition where the data are allocated:
(p)ˆ
dj=
(p)dj(p)dmin
(p)dmax (p)dmin
(p)dj=(p)ˆ
dj·(xmax xmin) + xmin
(13)
Once the HSI data has been split into partitions and scaled,
the next step consists of the application of the AE model.
The proposed AE is composed by 5 layers, as summarized in
Table I. These layers are: l(1) , the input layer that receives
the spectral signature contained in each pixel xiof X(i.e.,
the rows of the distributed partitions), composed by as many
neurons as spectral bands; l(2),l(3) and l(4) : the hidden AE-
layers, and l(5): the output layer that obtains the reconstructed
signature x0
i, composed also by as many neurons as spectral
bands.
TABLE I
TOP OLO GY O F THE PRO PO SED AU TO- ENCODER NEURAL NE TWO RK F OR
HYPERSPECTRAL IMAG E ANALYS IS
Layer ID l(1) l(2) l(3) l(4) l(5)
Neurons per l(i)nbands 140 60 140 nbands
With the topology described in Table I in mind, the encoder
part is composed by l(1),l(2) and l(3) , which performs the
mapping from the original spectral space to the latent space
of the bottleneck layer l(3). In addition, the decoder part
is composed by l(3),l(4) and l(5) , which performs the de-
mapping from the latent space of l(3) to the original spectral
space.
At this point, it is interesting to briefly comment the per-
formance of the AE network. In order to correctly propagate
the data through the network, from each partition (p)D
Rnrows ×(BS·nbands ), a matrix of unstacked pixels (p)X
R(BS·nr ows)×nbands is extracted, i.e. the BS spectral pixels
that are contained in each (p)dj= [xi,xi+1,· · · ,xi+B S ](with
j= 1,· · · , nrows and i= 1,· · · , npixels ) are each extracted
to create, one by one, the rows of (p)X, denoted as (p)xk[with
k= 1,· · · ,(BS ·nrow s)] in order to determine the level at
which the AE is working.
Every training iteration tis performed using the traditional
neural network forward-backward procedure, in addition to a
tree-aggregate operation that computes and sums the execu-
tors’ gradients and losses to return a single loss value and
a matrix of gradients. Each executor computes its loss by
forwarding the input network data (p)Xthrough the AE layers,
and comparing the l(5) layer’s output vector with the vector of
input features, following Eq. (4) and obtaining (at each t) the
corresponding MSE of the partition: (p)MSEt=E((p)X,Wt).
Gradients are then computed by back-propagating the error
signal through the AE, obtaining for each partition the (p)Gt
matrix at iteration t. Each gradient matrix is reduced in the
Driver, which runs the optimizer in order to obtain the final
matrix Wt. This matrix indicates how much each neuron
weight should be modified before finishing the t-th training
iteration, based on how that neuron impacts the output. Fig. 4
gives a graphical overview of the adopted training procedure.
If we denote by Pthe number of total partitions and
by (p)XR(BS·nr ows)×nbands the p-th unstacked partition
data, composed by (BS ×nrow s)normalized rows/feature
vectors of nbands spectral features, i.e. (p)xkRnbands =
(p)xk,1,· · · ,(p)xk,nbands , and considering the l-th layer of
the AE model, composed by n(l)
neurons, its output is denoted
by (p)X(l+1) and it is computed by adapting Eq. (1) into Eq.
(14) as the matrix multiplication:
(p)X(l+1) =H(p)X(l)·W(l)+b(l),(14)
where the meaning of each term is:
(p)X(l+1) R(BS×nr ows)×n(l)
neurons is the matrix that
represents the output of the neurons in layer lwith size
(BS ·nrow s)×n(l)
neurons, where n(l)
neurons is the number
of neurons of the l-th layer (in the case that l= 5,
n(5)
neurons =nbands).
(p)X(l)R(BS×nr ows)×n(l1)
neurons is the matrix that serves
as input to the l-th layer, which contains the (BS ·nrows )
pixel vectors represented in the feature space of the
previous layer, defined by n(l1)
neurons neurons.
W(l)Rn(l1)
neurons×n(l)
neurons is the matrix of weights,
which connects the current n(l)
neurons neurons with the
n(l1)
neurons neurons of the previous layer, and b(l)is the
bias of the current layer.
His the ReLU activation function, which gives the
following non-linear output: ReLU(x) = max(0, x).
After data forwarding, the reconstructed data (p)X0in the
p-th partition at the t-th iteration is compared to the original
input (p)Xby applying the MSE function defined by Eq. (4)
on each executor. Executors then retrieve the error computed
by their carried data, obtaining a value (p)MSEtper partition.
Then, the final error is obtained as the mean of all executor
errors, as shown in Eq. (15):
(p)MSEt=1
(BS ×nrow s)
(BS×nr ows)
X
k=1
k(p)xk(p)x0
kk2
MSEt=1
P
P
X
p=1
(p)MSEt,
(15)
Page 7 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 8
Fig. 4. Distributed forward and backward pipelines of the training stage (at iteration t) after unstacking the hyperspectral pixels in each distributed data
partition (each one allocated to a different worker node).
where (BS ×nrow s)is the number of pixels that compose the
p-th data partition, whereas (p)xk(p)Xand (p)x0
k(p)X0
are the original input sample and output reconstructed sample
in the p-th data partition, respectively. Those partition errors
are then back-propagated to compute the gradient (p)Gtmatrix
of each partition at iteration t. In this sense, for each layer in
the neural model (using the resulting outputs) the impact that
each neuron has on the final error is obtained as the result
of the ReLU’s derivative of every output, which is defined as
follows:
H0(x) = (0,if x0
1,if x > 0(16)
Such impact can be denoted as (p)gL
t= [(p)g(1)
t,· · · ,(p)g(5)
t],
where the l-th element (p)g(l)
tstores the impact of the n(l)
neurons
allocated into the l-th layer of the network.
The gradient of each partition, (p)Gt, is then computed
by applying the double precision general matrix to matrix
multiplication (DGEMM) operation where, given three input
matrices (A,Band C) and two constants (αand β), the
obtained results are calculated by Eq. (17) and stored in C:
C=αAB+βC.(17)
DGEMM is performed to compute the entire gradient matrix
in parallel, instead of computing each layer gradient vectors
separately. This allows us to make neural computations faster
and efficient in terms of reducing power consumption. In this
sense, each item of Eq. (17) has been replaced by:
α=1
nbands is a parameter regularizer.
A=(p)Xis the input data partition matrix.
B=(p)gL
tis the matrix representing the impact of each
neuron on every layer of the neural network.
β= 1 is also a parameter regularizer. As Cshould be
unchanged, it has been set to 1.
C=(p)Gt1is initially the older gradient matrix of
the p-th partition. After the updates resulting from the
DGEMM operation, the current gradient (p)Gtis stored
on C.
Finally, the gradient matrix Gtof the whole network is
computed as the average of the sum of all partition’s gradients:
(p)Gt. The entire training process on each data partition is
graphically illustrated on Fig. 4.
The final optimization step is performed locally on the
master node using a variant of the BFGS algorithm, called
limited BFGS (L-BFGS). Since BFGS needs a huge amount
of memory for the computation of the matrices, L-BFGS limits
the memory usage, so it fits better into our implementation.
The optimizer uses the computed gradients and a step-size
procedure to get closer to a minimum of Eq. (4). The procedure
is repeated until a desired number of iterations, niterations, is
reached.
IV. EXP ER IM EN TAL EVALUATIO N
A. Configuration of the Environment
In order to test our newly developed implementation, a
dedicated hardware and software environment based on a high-
end cloud computing paradigm has been adopted. The virtual
resources have been provided by the Jetsream Cloud Services3
at the Indiana University Pervasive Technology Institute (PTI).
Its user interface is based on Atmosphere computing platform4
and uses Openstack5as the operational software environment.
The hardware environment consists of a collection of cloud
computing nodes. In particular, the cluster consists of one
master node and eight slave nodes, which are hosted in virtual
machines with six virtual cores at 2.5 GHz each. Every node
has 16 GB of RAM and 60 GB of storage via a Block Storage
File System. As mentioned before, Spark performs as the
backbone for node interconnection, meanwhile data transfers
are supported by a local 4x40 Gbps dedicated network.
Each virtual machine runs Ubuntu 16.04 as operating sys-
tem, with Spark 2.1.1 and Java 1.8 serving as running plat-
forms. The Spark framework provides a distributed machine
learning library known as Spark Machine Learning Library
3https://jetstream-cloud.org/
4https://www.atmosphereiot.com/platform.html
5https://www.openstack.org/
Page 8 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 9
(MLLib6), which is used as support for the implementation
of our distributed AE network for remotely sensed HSI data
analysis. Moreover, the proposed implementation has been
coded in Scala 2.11, compiled into Java bytecode and inter-
preted in JVMs. Finally, mathematical operations from MLLib
are handled by Breeze (the numerical processing library for
Scala), in its 0.12 version, and by Netlib 1.1.2. In this sense,
Netlib wraps JVM calls into low level Basic Linear Algebra
Subprograms (BLAS) calls, so those calls are executed faster
than traditional executions.
B. Hyperspectral Datasets
With the aim of testing the performance of our newly
developed cloud-based and distributed AE network model,
two different HSI data sets have been considered in our
experiments. These data sets correspond to the full version of
the AVIRIS Indian Pines scene, referred hereinafter as the big
Indian Pines scene (BIP), and a set of images corresponding to
six different zones captured by the Hyperion spectrometer [57]
onboard NASAs EO-1 Satellite, which we have designated as
Hyperion data set (HDS). Both data sets are characterized by
their huge size, which makes them ideal to be processed in a
cloud-distributed environment. In the following, we provide a
description of the aforementioned data sets.
The big Indian Pines scene (BIP) scene (see Fig. 5)
was collected by AVIRIS in 1992 [5] over agricultural
fields in northwestern Indiana. The image comprises a
full flightline with a total of 2678 ×614 pixels (with 20
meters per pixel spatial resolution), covering 220 spectral
bands from 400 to 2500 nm.
The Hyperion data set (HDS) is composed by six full
flightlines (see Fig. 6) collected in 2016 by the Hyperion
spectrometer mounted on NASAs EO-1 satellite, which
collects spectral signatures using 220 spectral channels
ranging from 357 to 2576 nm with 10-nm bandwidth. The
captured scenes have a spatial resolution of 30 meters per
pixel. The standard scene width and length are 7.7 km
and 42 km, respectively, with an optional increased scene
length of 185 km. In particular, the considered images
have been stacked and treated together as a single image
comprising 20401 ×256 pixels with the spectral range
mentioned above. These images have been provided by
the Earth Resources Observation and Science (EROS)
Center in GEOTIFF format7. Also, each scene is accom-
panied by one identifier in the format YDDDXXXML,
which indicates the day of acquisition (DDD), and the
sensor that recorded the image (XXX, denoting Hyperion,
ALI or AC with 0=off and 1=on), the pointing mode
(M, which can be Nfor Nadir, Pfor Pointed within
path/row or Kfor Pointed outside path/row) and the scene
length (L, which can be Ffor Full scene, Pfor Partial
scene, Qfor Second partial scene and Sfor Swath). Also,
other letters can be used to create distinct entity IDs, for
example to indicate the Ground/Receiving Station (GGG)
6https://spark.apache.org/mllib
7These scenes are available online from the Earth Explorer site,
https://earthexplorer.usgs.gov
or the Version Number (VV). In this case, the identifiers
of the six considered images are: 065110KU, 035110KU,
212110KR, 247110KW, 261110KR and 321110KR.
C. Experiments and Discussion
Three different experiments have been conducted in order
to validate the performance of our cloud-distributed AE for
HSI data compression:
1) The first experiment analyzes the scalability of our
cloud-distributed AE, using a medium-sized data set. For
this purpose, the BIP data set has been processed with a
fixed number of training samples in the cloud environ-
ment described above, using one master and different
numbers of worker nodes. Here, we have reduced the
dimensionality of the BIP data set using PCA, retaining
the first 60 principal components that account for most
of the variance in the original data.
2) The second experiment illustrates the internal paral-
lelization (at the core level) of the worker nodes. For
this purpose, the HDS has been processed using four
different percentages of training data and 8 worker nodes
in the considered cluster, each with 6 virtual cores. As in
the previous experiment, we reduced the dimensionality
of the HDS data set using PCA, retaining the first
60 principal components that account for most of the
variance in the original data.
3) Finally, the third experiment test the performance of our
cloud-distributed AE using different numbers of training
samples and worker nodes over a large data set. This ex-
periment allows us to understand the internal operation
of data partitions. In this sense, the HDS data set used in
the previous experiment has been considered again using
4 different training percentages and 6 different numbers
of worker nodes.
1) Experiment 1: Our first experiment evaluates the per-
formance of the distributed implementation of the proposed
AE, using the BIP scene (reduced to 60 principal components
extracted by PCA) using 80% randomly selected samples to
create the training set and the remaining 20% of the samples
to create the test set. In order to demonstrate the scalability
of our cloud-distributed AE, the cloud environment has been
configured with one master node and different numbers of
worker nodes, specifically: 1, 2, 4, 8, 12 and 16 workers. In
order to show the robustness of our model, five Monte Carlo
runs have been executed, obtaining as a result the average and
the standard deviation of those executions.
Fig. 7 shows the obtained speed-up in a graphical way.
Such speed-up has been calculated as T1/Tn, where T1is
the execution time of the slowest execution with one worker
node and Tnis the average time of the executions with n
worker nodes. Comparing the theoretical and real speed-up
values obtained, it can be observed that the model is able to
scale very well, reaching a speed-up value that is very close
to the theoretical one with 2, 4 and 8 workers. However, for
12 workers and beyond, we can see that the communication
times between the nodes hamper the speed-up due to the
insufficient amount of data, a fairly common behaviour in
Page 9 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 10
Fig. 5. False RGB color map of the big Indian Pines (BIP) scene, using the spectral bands 88, 111 and 150.
Fig. 6. False RGB color map of the Hyperion data set (HDS), using the spectral bands 35, 110 and 150.
cloud environments, in which the main bottleneck occurs in
the communication between nodes. As a result, it is important
to make sure that there exists an adequate balance between
the total amount of data to be processed and the number
of processing nodes. Table II tabulates the performance data
collected in this experiment, which comprises reconstruction
errors, computation times and speed-ups. As we can observe in
Table II, the reconstruction errors achieved by the AE network
are very similar for different numbers of workers (with slight
changes due to the random selection of samples), keeping a
continuous value as the speed-up increases and more nodes
are introduced into the cluster. Also, it is worth noting that the
standard deviations of the error are very low, demonstrating
that the proposed implementation remains highly robust in
all cases. These very low errors are finally reflected in Fig.
8, which shows three reconstructed signatures of different
materials in the BIP scene. As it can be seen in Fig. 8, the
reconstructed signatures are extremely similar to the original
ones, a fact that allows for their exploitation in advanced
processing tasks such as classification or spectral unmixing.
2) Experiment 2: Our second experiment explores the in-
ternal parallelization of each worker node (at the core level).
For this purpose, the cloud-distributed AE has been tested on
the HDS dataset, again reducing the spectral dimensionality to
60 principal components and randomly collecting 20%, 40%,
60% and 80% of training samples to create the training set,
and the remaining 80%, 60%, 40% and 20% to create the test
set. Moreover, 1 master node and 8 worker nodes (each one
with 6 virtual cores) have been considered to implement the
cloud environment.
Fig. 9(a) shows the results obtained in this experiment. If
we compare the theoretical speed-up values and the real ones
Page 10 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 11
TABLE II
REC ONS TRU CTI ON E RRO RS (OB TAIN ED A S THE MSE B ETW EE N THE O RI GIN AL TE ST S AMP LE S AND T HE O NES R EC ONS TRU CTE D BY T HE PR OPO SE D
CLOUD-DI STR IBU TE D AE), ALO NG W ITH T HE P ROC ESS ING T IM ES AN D SP EED UP S OBTA INE D FO R DIFF ER ENT N UM BER S OF WO RK ERS W HE N
PRO CES SI NG TH E BIP IMAGE.
Workers 1 2 4 8 12 16
Loss (MSE) 7.93e-5 7.92e-5 8.35e-5 9.51e-5 8.60e-5 8.35e-5
Time (s) 17398.74 8991.12 4518.39 2354.91 1803.27 1288.69
Speedup 1 1.9308 3.8506 7.3882 9.6484 13.5011
Fig. 7. Scalability of the proposed cloud-distributed network when processing
the BIP dataset with 1, 2, 4, 8, 12 and 16 worker nodes and 1 master node.
The red line indicates the theoretical speed-up value and the blue bars indicate
the actual values reached.
obtained, it can be seen that our implementation is able to
reach a speed-up that is almost identical to the theoretical one.
This is quite important, as the obtained results indicate that the
scalability achieved in each node is almost linear with regards
to the size of the HSI scenes considered in each node, thanks
to the cores available in each node. In this way, the proposed
cloud-distributed AE implementation takes full advantage of
all the available resources, both in parallel (multi-core) and
distributed fashion.
3) Experiment 3: Our last experiment evaluates the scal-
ability of the proposed cloud AE for HSI data compression
using a very large-sized data set. The HDS images have been
considered for this purpose. Due to the great amount of data,
this experiment has been split in two parts. The first part
performs a comparison over a cloud environment composed
by 1, 2, 4, 8, 12 and 16 worker nodes, and 1 master node,
employing the 20% and 40% of the samples to create the
training set, and the remaining 80% and 60% of data to create
the test set. However, due to the memory limitations of the
workers, the second part performs a comparison over a cloud
environment composed by 2, 4, 8, 12 and 16 worker nodes, and
1 master node, employing the 60% and 80% of the samples
to create the training set, and the remaining 40% and 20% of
data to create the test set. In this context, it must be noted
that while in the first part the speed-up is obtained based on
the implementation with 1 worker node, in the second one
the speed-up is obtained based on the implementation with 2
worker nodes.
Figs. 9(b) and 9(c) show the results obtained by the two
parts of this experiment in a graphical way. In this case,
it is interesting to observe that the theoretical speed-up and
the linear speed-up values do not coincide. When we talk
about linear speed-up, we normally refer to the expected
speed-up when linear partitioning is performed in the cluster.
However, in a real environment the partitioning is not always
linear. In fact, we can observe a performance gap in the 8-
node configuration. This can be explained by the relationship
between the total number of cores in the cluster, C(obtained
as the number of cores per node multiplied by the number of
nodes), and the number of existing data partitions, P, given
by Eq. (18):
(λ1) ·C < P < λ ·C, (18)
where λis a scalar. For instance, when using 8-node con-
figuration, its value is set to λ= 2. Taking Eq. (18) into
consideration, and assuming that the cluster cores execute
tasks when they are free, the non-compliance of Eq. (18) leads
to the fact that some cores remain idle after finishing their
first allocated tasks, so the fine-grained parallelism is not fully
exploited in this case.
In the considered cluster, since each node has 6 cores, a total
of C= 6 ×Nworking cores can be exploited. Furthermore,
these Cworking cores allow for the processing of the data
partitions in batches of Ctasks at most. For instance, when
a configuration of 8 nodes is used, the cluster environment
is made up of a total of C= 6 ×8 = 48 working cores.
This indicates that, at most, in one processing batch Spark
will launch 48 tasks. As Spark splits the HSD data on 58
data partitions, 58 tasks must be executed over each partition.
However, in each batch only 48 tasks can be performed. This
means that two batches must be run; the first one with 48
tasks and the second one with only 10. As a result, the second
batch cannot fully exploit fine-grained parallelism as only 10
cores are being used, with 38 idle nodes. This results in an
unnecessary waste of computing resources.
However, when the idle cores from the second batch are
used, the performance improves. This is the case of the 12-
node configuration (C= 72), where the partitioning becomes
more efficient, complying with Eq. (18). Linear speed-up
based on workers needs to be added to this core-level speed-
up, leading to a new speed-up which is calculated as the
multiplication of those speed-ups, as indicated by Eq. (19):
Tw
1
Tw
n
·Tp
1
Tp
n
,(19)
where Tw
nis the processing time at the worker level and Tp
n
the processing time at the core level.
Page 11 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 12
Fig. 8. Comparison between the original (in blue) and reconstructed (in dotted red) spectral signatures extracted from the BIP scene by the proposed cloud
AE implementation using 8 workers.
(a) (b) (c)
Fig. 9. Scalability of the proposed cloud-distributed network when processing the HDS data set in experiments 2 and 3: a) with 8 worker nodes and 1 master
node, considering 20%, 40%, 60% and 80% of training data (experiment 2), b) with 1, 2, 4, 8, 12 and 16 worker nodes and 1 master node, considering 20%
and 40% of training data (experiment 3, first part), and c) with 2, 4, 8, 12 and 16 worker nodes and 1 master node, considering 60% and 80% of training data
(experiment 3, second part). The numbers in the parentheses indicate the total amount of data used in MB. The red lines indicates the theoretical speed-up
value (red-continuous line) and linear speed-up value (red-dotted line), while the blue and orange bars indicate the actual values reached.
With the aforementioned observations in mind, and focusing
on the results given by the first part of the experiment and
reported on Fig. 9(b) we can observe that, for each configura-
tion and training with 20% and 40% of the available samples,
the proposed AE exhibits quite similar speed-ups, with slight
variations due to the distribution of data and the role of idle
nodes. It is interesting to observe with 1-8 nodes how the
speed-up is quite similar to the theoretical one, while with
12-16 nodes the differences between obtained and theoretical
speed-up values are higher, indicating that the proposed AE
with only 20% and 40% of training samples does not take full
advantage of the cloud environment’s potential.
On the other part, Fig. 9(c) reports the obtained results
of the second part of this experiment. In this case, the base
implementation of the AE is conducted on a cloud environment
with 2 worker nodes, employing 60% and 80% of training
data. With 2 and 4 worker nodes the obtained speed-up values
are very similar employing 60% and 80% of the available
samples, while with 8, 12 and 16 nodes it is clear how the
version with more training data is able to achieve a superior
speed-up, reaching a value very similar to the theoretical one
with 16 nodes. This indicates that the amount of data handled
in this case is more convenient to take full advantage of the
way that Spark organizes data partitions and tasks in batches,
achieving better parallelization at the core level (fine-grained
parallelism) and also better distribution at the worker level
(coarse-grained parallelism). These conclusions are supported
by the data tabulated in Table III, where the speed-up employ-
ing 20% and 40% of training data has been obtained taking as
base times the cloud environment with 1 node, while for 60%
and 80% of training data the speed-up is obtained comparing
with the environment composed by 2 worker nodes due to
the exhausting use of memory. Once again, the MSE keeps
constant with different number of nodes, varying slightly with
the training percentage, which indicates that the network is
able to optimize very well, without overfitting the parameters
when 60% or 80% of the available training samples are used,
but also avoiding underfitting when few samples are used for
training purposes.
V. CON CLUSION AND FUTURE LINES
This paper presents a new cloud-based AE neural network
for remotely sensed HSI data analysis in distributed fashion.
This kind of artificial neural network finds non-linear solu-
tions when compressing the data, as opposed to traditional
techniques such as PCA. In this sense, the proposed approach
is more suitable for complex data sets such as HSIs. The
proposed AE implementation over a Spark cluster exhibits
great performance, not only in terms of data compression
and error reconstruction, but also in terms of scalability when
processing huge data volumes, which cannot be achieved by
traditional (sequential) AE implementations. Those sequential
algorithms may be a valid option when the data to be managed
and analyzed can be stored in a single machine with limited
Page 12 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 13
TABLE III
REC ONS TRU CTI ON E RRO RS (OB TAIN ED A S THE MSE B ETW EE N THE O RI GIN AL TE ST S AMP LE S AND T HE O NES R EC ONS TRU CTE D BY T HE PR OPO SE D
CLOUD-DI STR IBU TE D AE), ALO NG W ITH T HE P ROC ESS ING T IM ES AN D SP EED UP S OBTA INE D FO R DIFF ER ENT N UM BER S OF WO RK ERS W HE N
PRO CES SI NG TH E HDS DATA SET.
Number of workers
Training Percentage (size) 1 2 4 8 12 16
Loss (MSE)
20 (1838 MB) 6.09e-5 6.12e-05 6.01e-05 5.92e-05 6.47e-05 5.60e-05
40 (3676 MB) 6.56e-5 5.80e-05 6.30e-05 6.44e-05 6.67e-05 6.13e-05
60 (5515 MB) N/A 5.88e-05 6.49e-05 6.27e-05 6.44e-05 5.70e-05
80 (7353 MB) N/A 6.59e-05 6.29e-05 6.25e-05 6.30e-05 6.20e-05
Time (s)
20 (1838 MB) 14919.73 7632.64 4433.11 2952.27 1606.94 1171.87
40 (3676 MB) 30526.79 15709.24 9087.66 5721.97 3311.36 2182.60
60 (5515 MB) N/A 21505.27 12458.54 8456.84 4536.73 3122.92
80 (7353 MB) N/A 32645.44 18900.49 11633.75 6084.69 4103.02
Speedup
20 (1838 MB) 1 1.9547 3.3655 5.0536 9.2845 12.7315
40 (3676 MB) 1 1.9432 3.3591 5.2796 9.2187 13.9864
60 (5515 MB) N/A 1 1.7247 2.5191 4.7402 6.8862
80 (7353 MB) N/A 1 1.7268 2.8304 5.3651 7.9564
processing and memory resources. However, for large amounts
of HSI data, sequential implementations can easily run out of
memory or require a vast amount of computing time, which
cannot be assumed when reliable processing is needed in a
reasonable time. In this regard, both HPC and HTC alternatives
have provided new paths to solve those problems, includ-
ing parallelization on GPUs and distribution/parallelization
on clusters with cloud computing-based solutions. The ex-
periments carried out in this work demonstrate that cloud
versions of HSI data processing methods provide efficient
and effective HPC-HTC alternatives that successfully solve
the inherent problems of sequential versions by increasing
hardware capabilities in a cheaper way as compared with
other solutions such as grid computing. Also, the obtained
results reveal that the computation performance of cloud-
based solutions easily increases with larger data sets, taking
advantage of the computational load distribution when there
is a good balance between the amount of data and the cluster
complexity. Encouraged by the good results obtained in this
work, in the future we will develop other implementation of
HSI processing techniques in cloud computing environments.
Further work will also explore the design of more sophisticated
scheduling algorithms in order to circumvent the negative
impact introduced by idle processing cores in our current
implementation.
ACK NOW LE DG EM EN T
The authors would like to express their gratitude to the
Jetstream initiative, led by the Indiana University Pervasive
Technology Institute (PTI), for providing the cloud computing
environment and hardware resources used in this work.
REF ER EN CE S
[1] W. Emery and A. Camps, Chapter 2 - Basic Electromagnetic Concepts
and Applications to Optical Sensors. Elsevier, 2017, pp. 43–85.
[2] A. F. Goetz, G. Vane, J. E. Solomon, and B. N. Rock,
“Imaging Spectrometry for Earth Remote Sensing,” Science,
vol. 228, no. 4704, pp. 1147–1153, 1985. [Online]. Available:
http://science.sciencemag.org/content/228/4704/1147
[3] M. Teke, H. S. Deveci, O. Halilo˘
glu, S. Z¨
ubeyde G¨
urb¨
uz, and
U. Sakarya, “A Short Survey of Hyperspectral Remote Sensing Ap-
plications in Agriculture,” in Recent Advances in Space Technologies
(RAST), 2013.
[4] A. Plaza, J. Plaza, A. Paz, and S. Sanchez, “Parallel Hyperspectral Image
and Signal Processing,” IEEE Signal Processing Magazine, vol. 28,
no. 3, pp. 119–126, 2011.
[5] R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aronsson,
B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit, M. Solis,
M. R. Olah, and O. Williams, “Imaging spectroscopy and the Airborne
Visible/Infrared Imaging Spectrometer (AVIRIS),Remote Sensing of
Environment, vol. 65, no. 3, pp. 227–248, 1998. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0034425798000649
[6] K. I. Iteen and et al., “APEX - the Hyperspectral ESA Airborne Prism
Experiment,” Sensors, 2008.
[7] C. D. A. Stephen K. Babey, “Compact Airborne Spectrographic Imager
(CASI): a Progress Review,” in Proc.SPIE, vol. 1937, 1993, pp. 1937
– 1937 – 12. [Online]. Available: https://doi.org/10.1117/12.157052
[8] S. G. Ungar, J. S. Pearlman, J. A. Mendenhall, and D. Reuter, “Overview
of the Earth Observing One (EO-1) mission,” IEEE Transactions on
Geoscience and Remote Sensing, 2003.
[9] M. J. Barnsley, J. J. Settle, M. A. Cutter, D. R. Lobb, and F. Teston,
“The PROBA/CHRIS Mission: a Low-Cost Smallsat for Hyperspectral
Multiangle Observations of the Earth Surface and Atmosphere,IEEE
Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp.
1512–1520, July 2004.
[10] L. Guanter and et al., “The EnMAP spaceborne imaging spectroscopy
mission for earth observation,” 2015.
[11] C. Galeazzi, A. Sacchetti, A. Cisbani, and G. Babini, “The PRISMA
Program,” in IGARSS 2008 - 2008 IEEE International Geoscience and
Remote Sensing Symposium, vol. 4, July 2008, pp. IV – 105–IV – 108.
[12] M. A. Wulder, J. G. Masek, W. B. Cohen, T. R. Loveland, and C. E.
Woodcock, “Opening the archive: How free data has enabled the science
and monitoring promise of Landsat,” Remote Sensing of Environment,
2012.
[13] J. Aschbacher, ESA’s Earth Observation Strategy and Copernicus.
Singapore: Springer Singapore, 2017, pp. 81–86. [Online]. Available:
https://doi.org/10.1007/978-981-10-3713-9 5
[14] Y. Ma, H. Wu, L. Wang, B. Huang, R. Ranjan, A. Zomaya, and W. Jie,
“Remote sensing big data computing: Challenges and opportunities,”
Future Generation Computer Systems, vol. 51, no. Supplement C, pp.
47 – 60, 2015.
[15] M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big
Data for Remote Sensing: Challenges and Opportunities,” Proceedings
of the IEEE, vol. 104, no. 11, pp. 2207–2219, 2016.
[16] A. Plaza, J. Plaza, and D. Valencia, “Impact of platform heterogeneity on
the design of parallel algorithms for morphological processing of high-
dimensional image data,” Journal of Supercomputing, vol. 40, no. 1, pp.
81–107, 2007.
[17] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone,
G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri,
M. Marconcini, J. C. Tilton, and G. Trianni, “Recent advances in
Page 13 of 14 Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 14
techniques for hyperspectral image processing,” Remote Sensing of
Environment, vol. 113, no. 1, pp. S110–S122, 2009.
[18] A. Plaza, D. Valencia, J. Plaza, and P. Martinez, “Commodity cluster-
based parallel processing of hyperspectral imagery,” Journal of Parallel
and Distributed Computing, 2006.
[19] D. Gorgan, V. Bacu, T. Stefanut, D. Rodila, and D. Mihon, “Grid Based
Satellite Image Processing Platform for Earth Observation Application
Development,” in 2009 IEEE International Workshop on Intelligent
Data Acquisition and Advanced Computing Systems: Technology and
Applications, 2009, pp. 247–252.
[20] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud Computing and Grid
Computing 360-Degree Compared,” in 2008 Grid Computing Environ-
ments Workshop, Nov 2008, pp. 1–10.
[21] Z. Chen, N. Chen, C. Yang, and L. Di, “Cloud Computing Enabled
Web Processing Service for Earth Observation Data Processing,” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 2012.
[22] G. Hager and G. Wellein, Introduction to High Performance Computing
for Scientists and Engineers. CRC Press., 2010.
[23] K. Stanoevska-Slabeva, T. Wozniak, and S. Ristol, Grid and Cloud
Computing: A Business Perspective on Technology and Applications.
Springer, 2010.
[24] A. Fern´
andez, S. del R´
ıo, V. L´
opez, A. Bawakid, M. J. del Jesus, J. M.
Ben´
ıtez, and F. Herrera, “Big Data with Cloud Computing: An insight
on the computing environment, MapReduce, and programming frame-
works,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, vol. 4, 2014.
[25] Amazon Web Services, “Overview of Amazon Web Services,Amazon
Web Services, 2017.
[26] Microsoft, “Microsoft Azure Cloud Computing Platform; Services,”
2017.
[27] C. Severance, Using Google App Engine: Start Building and Running
Web Apps on Google’s Infrastructure. O’Reilly, 2009.
[28] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on
large clusters,” Journal of Communication Systems, 2004.
[29] D. Borthakur, “The Hadoop Distributed File System : Architecture and
Design,” Access, 2007.
[30] Apache Spark, “Apache Spark - Unified Analytics Engine for Big Data,”
2018.
[31] J. M. Haut, M. Paoletti, J. Plaza, and A. Plaza, “Cloud implementation
of the k-means algorithm for hyperspectral image analysis,” The Journal
of Supercomputing, vol. 73, no. 1, pp. 514–529, 2017.
[32] V. A. A. Quirita, G. A. O. P. da Costa, P. N. Happ, R. Q. Feitosa,
R. da Silva Ferreira, D. A. B. Oliveira, and A. Plaza, “A new cloud
computing architecture for the classification of remote sensing data,”
IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 10, no. 2, pp. 409–416, 2017.
[33] Y. Zhang, Z. Wu, J. Sun, Y. Zhang, Y. Zhu, J. Liu, Q. Zang, and
A. Plaza, “A distributed parallel algorithm based on low-rank and sparse
representation for anomaly detection in hyperspectral images,” Sensors,
vol. 18, no. 11, 2018. [Online]. Available: http://www.mdpi.com/1424-
8220/18/11/3627
[34] J. Setoain, M. Prieto, C. Tenllado, and F. Tirado, “GPU for Parallel On-
Board Hyperspectral Image Processing,” International Journal of High
Performance Computing Applications, 2008.
[35] A. J. Plaza and C. I. Chang, High Performance Computing in Remote
Sensing Book Review Book Review. Boca Raton, Florida: Chapman &
Hall/CRC Press, Computer & Information Science Series, 2008.
[36] C. Gonz´
alez, S. S´
anchez, A. Paz, J. Resano, D. Mozos, and A. Plaza,
“Use of FPGA or GPU-based architectures for remotely sensed hyper-
spectral image processing,” Integration, the VLSI Journal, vol. 46, no. 2,
pp. 89 – 103, 2013.
[37] A. Plaza, P. Mart´
ınez, J. Plaza, and R. P´
erez, “Dimensionality reduc-
tion and classification of hyperspectral image data using sequences
of extended morphological transformations,” in IEEE Transactions on
Geoscience and Remote Sensing, 2005.
[38] D. Tuia, S. Lopez, M. Schaepman, and J. Chanussot, “Foreword to
the special issue on hyperspectral image and signal processing,” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 2015.
[39] Z. Wu, Y. Li, A. Plaza, J. Li, F. Xiao, and Z. Wei, “Parallel and
distributed dimensionality reduction of hyperspectral data on cloud
computing architectures,” IEEE Journal of Selected Topics in Applied
Earth Observations and Remote Sensing, vol. 9, no. 6, pp. 2270–2278,
June 2016.
[40] X. Jia, B. C. Kuo, and M. M. Crawford, “Feature mining for hyperspec-
tral image classification,” Proceedings of the IEEE, vol. 101, no. 3, pp.
676–697, March 2013.
[41] R. Bellman, Adaptive Control Processes: A Guided Tour, ser. Princeton
Legacy Library. Princeton University Press, 2015.
[42] G. Hughes, “On the mean accuracy of statistical pattern recognizers,”
IEEE Transactions on Information Theory, vol. 14, no. 1, pp. 55–63,
January 1968.
[43] T.-W. Lee, “Independent component analysis,” in Independent compo-
nent analysis. Springer, 1998, pp. 27–66.
[44] J. Wang and C.-I. Chang, “Independent component analysis-based
dimensionality reduction with applications in hyperspectral image anal-
ysis,” IEEE transactions on geoscience and remote sensing, vol. 44,
no. 6, pp. 1586–1600, 2006.
[45] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation
for ordering multispectral data in terms of image quality with implica-
tions for noise removal,IEEE Trans. Geosci. Remote Sens., vol. 26,
no. 1, pp. 65–74, Jan. 1988.
[46] N. He, M. E. Paoletti, L. Fang, S. Li, A. Plaza, J. Plaza et al., “Feature
extraction with multiscale covariance maps for hyperspectral image
classification,” IEEE Transactions on Geoscience and Remote Sensing,
no. 99, pp. 1–15, 2018.
[47] S. Wold, K. Esbensen, and P. Geladi, “Principal Component Analysis,
Chemometrics and Intelligent Laboratory System, vol. 2, no. 1, pp. 37–
52, 1987.
[48] D. Fernandez, C. Gonzalez, D. Mozos, and S. Lopez, “Fpga implemen-
tation of the principal component analysis algorithm for dimensionality
reduction of hyperspectral images,” Journal of Real-Time Image Pro-
cessing, pp. 1–12, 2016.
[49] E. Martel, R. Lazcano, J. L´
opez, D. Madro˜
nal, R. Salvador, S. L´
opez,
E. Juarez, R. Guerra, C. Sanz, and R. Sarmiento, “Implementation of the
principal component analysis onto high-performance computer facilities
for hyperspectral dimensionality reduction: Results and comparisons,”
Remote Sensing, vol. 10, no. 6, p. 864, 2018.
[50] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep Learning-Based
Classification of Hyperspectral Data,” IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp.
2094–2107, 2014.
[51] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted
Boltzmann Machines,” in Proceedings of the 27th International Con-
ference on Machine Learning (ICML-10), Johannes F¨
urnkranz and
Thorsten Joachims, Ed. Omnipress, 2010, pp. 807–814.
[52] G. E. Hinton and R. S. Zemel, “Autoencoders, Minimum Description
Length and Helmholtz Free Energy,” in Proceedings of the 6th
International Conference on Neural Information Processing Systems.
Denver, Colorado: Morgan Kaufmann Publishers Inc., 1993, pp. 3–10.
[Online]. Available: http://dl.acm.org/citation.cfm?id=2987189.2987190
[53] M. S. Apostolopoulou, D. G. Sotiropoulos, I. E. Livieris, and P. Pintelas,
“A memoryless bfgs neural network training algorithm,” in Industrial
Informatics, 2009. INDIN 2009. 7th IEEE International Conference on.
IEEE, 2009, pp. 216–221.
[54] N. M. Nawi, M. R. Ransing, and R. S. Ransing, “An improved learning
algorithm based on the broyden-fletcher-goldfarb-shanno (bfgs) method
for back propagation neural networks,” in Sixth International Conference
on Intelligent Systems Design and Applications, vol. 1. IEEE, 2006,
pp. 152–157.
[55] R. Fletcher and M. J. Powell, “A rapidly convergent descent method for
minimization,” The computer journal, vol. 6, no. 2, pp. 163–168, 1963.
[56] Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng,
“On optimization methods for deep learning,” in Proceedings of the
28th International Conference on International Conference on Machine
Learning. Omnipress, 2011, pp. 265–272.
[57] J. Pearlman, S. Carman, C. Segal, P. Jarecke, P. Clancy, and W. Browne,
“Overview of the hyperion imaging spectrometer for the nasa eo-1
mission,” in IGARSS 2001. Scanning the Present and Resolving the
Future. Proceedings. IEEE 2001 International Geoscience and Remote
Sensing Symposium (Cat. No.01CH37217), vol. 7, July 2001, pp. 3036–
3038 vol.7.
Page 14 of 14Transactions on Geoscience and Remote Sensing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
... Other studies [40][41][42][43][44][45][46][47] use the Spectral Angle Difference (SAD) while the Signal-to-Noise Ratio (SNR) is omnipresent in [44,[48][49][50][51]. Further, the Peak Signal-to-Noise Ratio (PSNR) is adopted in [49,50,[52][53][54][55][56][57][58][59][60][61]. PSNR is the most frequently used quality metric, yet it is more content specific. ...
... This is a drawback since the degradation of the image quality is not caused by an external factor but by the model itself [62]. Also, the Mean Square Error (MSE) and Root MSE (RMSE) metrics are employed in [46,49,51,55,[63][64][65][66] while the Normalized MSE (NMSE) is used in [56,[67][68][69]. The use of normalization facilitates comparison between different datasets. ...
... However, it usually involves division by the range, which can hamper comparisons when extreme samples exist, especially for small-sized datasets. The Mean Absolute Error (MAE) metric, or Mean Absolute Deviation (MAD), is used in [46,49,51,55,65]. Finally, the work in [70] uses the percentage of retained information as an indication of signal quality. ...
Article
Full-text available
Hyperspectral imaging is an indispensable technology for many remote sensing applications, yet expensive in terms of computing resources. It requires significant processing power and large storage due to the immense size of hyperspectral data, especially in the aftermath of the recent advancements in sensor technology. Issues pertaining to bandwidth limitation also arise when seeking to transfer such data from airborne satellites to ground stations for postprocessing. This is particularly crucial for small satellite applications where the platform is confined to limited power, weight, and storage capacity. The availability of onboard data compression would help alleviate the impact of these issues while preserving the information contained in the hyperspectral image. We present herein a systematic review of hardware-accelerated compression of hyperspectral images targeting remote sensing applications. We reviewed a total of 101 papers published from 2000 to 2021. We present a comparative performance analysis of the synthesized results with an emphasis on metrics like power requirement, throughput, and compression ratio. Furthermore, we rank the best algorithms based on efficiency and elaborate on the major factors impacting the performance of hardware-accelerated compression. We conclude by highlighting some of the research gaps in the literature and recommend potential areas of future research.
... CC is another computing approach that is highly relevant for ML and DL, making parallel and distributed computing more straightforward to use (e.g., via containers or Jupyter notebooks) than traditional rather complex HPC systems. Remote sensing researchers and health scientists often take advantage of Apache open-source tools with parallel and distributed algorithms (e.g., map-reduce [6] as a specific form of divide and conquer approach) based on Spark [7] or the larger Hadoop ecosystem [8]). Also, inherent in many ML and DL approaches are optimization techniques while many of them are fast solvable by QCs [9] that represent the most disruptive type of computing today. ...
... Beside DL packages in containers, also CC vendors offer other relevant software stacks in containers or native that are used by RS researchers with parallel and scalable tools such as Apache Spark (see Fig. 3 R) [8] in the last years. Our experience case study in using Apache Spark in Clouds as described in Haut et al. [7] uses Spark to develop a cloud implementation of a DL network for non-linear RS data compression known as AutoEncoder (AE). Of course, Spark pipelines offer also the possibility to work in conjunction with DL techniques such as recently shown by Lunga et al. [21] for RS datasets. ...
... CC evolved as an evolution of Grid computing to make parallel and distributed computing more straightforward to use than traditional rather complex HPC systems. In this context RS researchers often take advantage of Apache open-source tools with parallel and distributed algorithms (e.g., map-reduce [6] as a specific form of divide and conquer approach) based on Spark [7] or the larger Hadoop ecosystem [8]. Inherent in many ML and DL approaches are optimization techniques while many of them are incredibly fast solvable by QCs [9] that represent the most innovative type of computing today. ...
Conference Paper
Full-text available
We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.
... CC evolved as an evolution of Grid computing to make parallel and distributed computing more straightforward to use than traditional rather complex HPC systems. In this context RS researchers often take advantage of Apache open-source tools with parallel and distributed algorithms (e.g., map-reduce [2] as a specific form of divide and conquer approach) based on Spark [3] or the larger Hadoop ecosystem [4]. Inherent in many ML and DL approaches are optimization techniques while many of them are incredibly fast solvable by QCs [5] that represent the most innovative type of computing today. ...
... Beside HPC systems, also CC vendors offer multi-core and many-core processing power that is used by RS researchers with parallel and scalable tools such as Apache Spark [4] in the last years. For example, Haut et al. [3] uses Spark to develop a cloud implementation of a DL network for non-linear RS data compression known as AutoEncoder (AE). CC techniques such as Spark pipelines offer also the possibility to work in conjunction with DL techniques such as recently shown by Lunga et al. [12] for RS datasets. ...
Conference Paper
Full-text available
Using computationally efficient techniques for transforming the massive amount of Remote Sensing (RS) data into scientific understanding is critical for Earth science. The utilization of efficient techniques through innovative computing systems in RS applications has become more widespread in recent years. The continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., 'big data') requires powerful computing resources with equally increasing performance. This paper reviews recent advances in High-Performance Computing (HPC), Cloud Computing (CC), and Quantum Computing (QC) applied to RS problems. It thus represents a snapshot of the state-of-the-art in ML in the context of the most recent developments in those computing areas, including our lessons learned over the last years. Our paper also includes some recent challenges and good experiences by using Europeans fastest supercomputer for hyper-spectral and multi-spectral image analysis with state-of-the-art data analysis tools. It offers a thoughtful perspective of the potential and emerging challenges of applying innovative computing paradigms to RS problems.
... More recently, cloud computing-based systems have emerged as feasible alternatives to handle data-intensive applications [15,[19][20][21]33,34]. However, there are still a number of issues to be considered and investigated in the design of cloud-based methods for remote sensing problems [14,15,25], particularly with respect to implementation of distributed unmixing algorithms, which are highly complex and computationally demanding [12]. ...
Article
Full-text available
In this work, we introduce a novel, distributed version of the N-FINDR endmember extraction algorithm, which is able to exploit computer cluster resources in order to efficiently process large volumes of hyperspectral data. The implementation of the distributed algorithm was done by extending the InterCloud Data Mining Package, originally adopted for land cover classification, through the HyperCloud-RS framework, here adapted for endmember extraction, which can be executed on cloud computing environments, allowing users to elastically administer processing power and storage space for adequately handling very large datasets. The framework supports distributed execution, network communication, and fault tolerance, transparently and efficiently to the user. The experimental analysis addresses the performance issues, evaluating both accuracy and execution time, over the processing of different synthetic versions of the AVIRIS Cuprite hyperspectral dataset, with 3.1 Gb, 6.2 Gb, and 15.1Gb respectively, thus addressing the issue of dealing with large-scale hyperspectral data. As a further contribution of this work, we describe in detail how to extend the HyperCloud-RS framework by integrating other endmember extraction algorithms, thus enabling researchers to implement algorithms specifically designed for their own assessment.
... Yet, locally hosted software options are unlikely to be the future for hyperspectral data processing. Hyperspectral data fits within the category of "big geo data" (Krämer and Senner, 2015) and is, therefore, better suited to scalable and distributed cloud processing rather than local computing capabilities (Wilson et al., 2018;Haut et al., 2019). Although cloud-based high-performance computing (HPC) is not a new concept (e.g., Plaza et al., 2011), its intersection with hyperspectral data for environmental analyses-particularly in aquatic environments-is in its infancy and remains an area for considerable future growth. ...
Article
Full-text available
Intensifying pressure on global aquatic resources and services due to population growth and climate change is inspiring new surveying technologies to provide science-based information in support of management and policy strategies. One area of rapid development is hyperspectral remote sensing: imaging across the full spectrum of visible and infrared light. Hyperspectral imagery contains more environmentally meaningful information than panchromatic or multispectral imagery and is poised to provide new applications relevant to society, including assessments of aquatic biodiversity, habitats, water quality, and natural and anthropogenic hazards. To aid in these advances, we provide resources relevant to hyperspectral remote sensing in terms of providing the latest reviews, databases, and software available for practitioners in the field. We highlight recent advances in sensor design, modes of deployment, and image analysis techniques that are becoming more widely available to environmental researchers and resource managers alike. Systems recently deployed on space- and airborne platforms are presented, as well as future missions and advances in unoccupied aerial systems (UAS) and autonomous in-water survey methods. These systems will greatly enhance the ability to collect interdisciplinary observations on-demand and in previously inaccessible environments. Looking forward, advances in sensor miniaturization are discussed alongside the incorporation of citizen science, moving toward open and FAIR (findable, accessible, interoperable, and reusable) data. Advances in machine learning and cloud computing allow for exploitation of the full electromagnetic spectrum, and better bridging across the larger scientific community that also includes biogeochemical modelers and climate scientists. These advances will place sophisticated remote sensing capabilities into the hands of individual users and provide on-demand imagery tailored to research and management requirements, as well as provide critical input to marine and climate forecasting systems. The next decade of hyperspectral aquatic remote sensing is on the cusp of revolutionizing the way we assess and monitor aquatic environments and detect changes relevant to global communities.
Article
Remote sensing is an approach that collects information from a scene using airborne or spaceborne sensors. It has been widely used in various fields of earth observation and space exploration, including resource utilization, environmental monitoring, geological exploration, agricultural production, urban planning, and so on. Essentially, remote sensing technology can be recognized as multidisciplinary science and engineering to efficiently treat macro-observation issues. Remote sensing instruments have developed significantly during the last decade. Furthermore, the number of available sensors has increased significantly and their applications are much more widespread by remote sensing data being used in Google maps and social media applications in addition to more traditional environmental monitoring and land-use approaches. With the fast development of remote sensing techniques and platforms, the amount of data with higher spectral, spatial, temporal resolutions and multiple structures available from remote sensing systems is increasing at an extremely fast pace. This has posed serious challenges for efficient and scalable processing in a timely fashion to support various practical remote sensing applications [1]-[3].
Article
This article gives a survey of state-of-the-art methods for processing remotely sensed big data and thoroughly investigates existing parallel implementations on diverse popular high-performance computing platforms. The pros/cons of these approaches are discussed in terms of capability, scalability, reliability, and ease of use. Among existing distributed computing platforms, cloud computing is currently the most promising solution to efficient and scalable processing of remotely sensed big data due to its advanced capabilities for high-performance and service-oriented computing. We further provide an in-depth analysis of state-of-the-art cloud implementations that seek for exploiting the parallelism of distributed processing of remotely sensed big data. In particular, we study a series of scheduling algorithms (GSs) aimed at distributing the computation load across multiple cloud computing resources in an optimized manner. We conduct a thorough review of different GSs and reveal the significance of employing scheduling strategies to fully exploit parallelism during the remotely sensed big data processing flow. We present a case study on large-scale remote sensing datasets to evaluate the parallel and distributed approaches and algorithms. Evaluation results demonstrate the advanced capabilities of cloud computing in processing remotely sensed big data and the improvements in computational efficiency obtained by employing scheduling strategies.
Article
As a newly emerging technology, deep learning (DL) is a very promising field in big data applications. Remote sensing often involves huge data volumes obtained daily by numerous in-orbit satellites. This makes it a perfect target area for data-driven applications. Nowadays, technological advances in terms of software and hardware have a noticeable impact on Earth observation applications, more specifically in remote sensing techniques and procedures, allowing for the acquisition of data sets with greater quality at higher acquisition ratios. This results in the collection of huge amounts of remotely sensed data, characterized by their large spatial resolution (in terms of the number of pixels per scene), and very high spectral dimensionality, with hundreds or even thousands of spectral bands. As a result, remote sensing instruments on spaceborne and airborne platforms are now generating data cubes with extremely high dimensionality, imposing several restrictions in terms of both processing runtimes and storage capacity. In this article, we provide a comprehensive review of the state of the art in DL for remote sensing data interpretation, analyzing the strengths and weaknesses of the most widely used techniques in the literature, as well as an exhaustive description of their parallel and distributed implementations (with a particular focus on those conducted using cloud computing systems). We also provide quantitative results, offering an assessment of a DL technique in a specific case study (source code available: https://github.com/mhaut/cloud-dnn-HSI ). This article concludes with some remarks and hints about future challenges in the application of DL techniques to distributed remote sensing data interpretation problems. We emphasize the role of the cloud in providing a powerful architecture that is now able to manage vast amounts of remotely sensed data due to its implementation simplicity, low cost, and high efficiency compared to other parallel and distributed architectures, such as grid computing or dedicated clusters.
Article
Full-text available
The Consultative Committee for Space Data Systems (CCSDS) published the CCSDS 123.0-B-2, “Low- Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression” standard. This standard extends the previous issue, CCSDS 123.0-B-1, which supported only lossless compression, while maintaining backward compatibility. The main novelty of the new issue is support for near-lossless compression, i.e., lossy compression with user-defined absolute and/or relative error limits in the reconstructed images. This new feature is achieved via closed-loop quantization of prediction errors. Two further additions arise from the new near lossless support: first, the calculation of predicted sample values using sample representatives that may not be equal to the reconstructed sample values, and, second, a new hybrid entropy coder designed to provide enhanced compression performance for low-entropy data, prevalent when non lossless compression is used. These new features enable significantly smaller compressed data volumes than those achievable with CCSDS 123.0-B-1 while controlling the quality of the decompressed images. As a result, larger amounts of valuable information can be retrieved given a set of bandwidth and energy consumption constraints.
Article
Full-text available
Imaging spectroscopy, also known as hyperspectral remote sensing, is based on the characterization of Earth surface materials and processes through spectrally-resolved measurements of the light interacting with matter. The potential of imaging spectroscopy for Earth remote sensing has been demonstrated since the 1980s. However, most of the developments and applications in imaging spectroscopy have largely relied on airborne spectrometers, as the amount and quality of space-based imaging spectroscopy data remain relatively low to date. The upcoming Environmental Mapping and Analysis Program (EnMAP) German imaging spectroscopy mission is intended to fill this gap. An overview of the main characteristics and current status of the mission is provided in this contribution. The core payload of EnMAP consists of a dual-spectrometer instrument measuring in the optical spectral range between 420 and 2450 nm with a spectral sampling distance varying between 5 and 12 nm and a reference signal-to-noise ratio of 400:1 in the visible and near-infrared and 180:1 in the shortwave-infrared parts of the spectrum. EnMAP images will cover a 30 km-wide area in the across-track direction with a ground sampling distance of 30 m. An across-track tilted observation capability will enable a target revisit time of up to four days at the Equator and better at high latitudes. EnMAP will contribute to the development and exploitation of spaceborne imaging spectroscopy applications by making high-quality data freely available to scientific users worldwide.
Article
Full-text available
Anomaly detection aims to separate anomalous pixels from the background, and has become an important application of remotely sensed hyperspectral image processing. Anomaly detection methods based on low-rank and sparse representation (LRASR) can accurately detect anomalous pixels. However, with the significant volume increase of hyperspectral image repositories, such techniques consume a significant amount of time (mainly due to the massive amount of matrix computations involved). In this paper, we propose a novel distributed parallel algorithm (DPA) by redesigning key operators of LRASR in terms of MapReduce model to accelerate LRASR on cloud computing architectures. Independent computation operators are explored and executed in parallel on Spark. Specifically, we reconstitute the hyperspectral images in an appropriate format for efficient DPA processing, design the optimized storage strategy, and develop a pre-merge mechanism to reduce data transmission. Besides, a repartitioning policy is also proposed to improve DPA’s efficiency. Our experimental results demonstrate that the newly developed DPA achieves very high speedups when accelerating LRASR, in addition to maintaining similar accuracies. Moreover, our proposed DPA is shown to be scalable with the number of computing nodes and capable of processing big hyperspectral images involving massive amounts of data.
Article
Full-text available
Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA), suffer fromtheir computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU) and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA)-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.
Article
Full-text available
Remotely sensed hyperspectral imaging is a very active research area, with numerous contributions in the recent scientific literature. The analysis of these images represents an extremely complex procedure from a computational point of view, mainly due to the high dimensionality of the data and the inherent complexity of the state-of-the-art algorithms for processing hyperspectral images. This computational cost represents a significant disadvantage in applications that require real-time response, such as fire tracing, prevention and monitoring of natural disasters, chemical spills, and other environmental pollution. Many of these algorithms consider, as one of their fundamental stages to fully process a hyperspectral image, a dimensionality reduction in order to remove noise and redundant information in the hyperspectral images under analysis. Therefore, it is possible to significantly reduce the size of the images, and hence, alleviate data storage requirements. However, this step is not exempt of computationally complex matrix operations, such as the computation of the eigenvalues and the eigenvectors of large and dense matrices. Hence, for the aforementioned applications in which prompt replies are mandatory, this dimensionality reduction must be considerably accelerated, typically through the utilization of high-performance computing platforms. For this purpose, reconfigurable hardware solutions such as field-programmable gate arrays have been consolidated during the last years as one of the standard choices for the fast processing of hyperspectral remotely sensed images due to their smaller size, weight and power consumption when compared with other high-performance computing systems. In this paper, we propose the implementation in reconfigurable hardware of the principal component analysis (PCA) algorithm to carry out the dimensionality reduction in hyperspectral images. Experimental results demonstrate that our hardware version of the PCA algorithm significantly outperforms a commercial software version, which makes our reconfigurable system appealing for onboard hyperspectral data processing. Furthermore, our implementation exhibits real-time performance with regard to the time that the targeted hyperspectral instrument takes to collect the image data.
Article
Full-text available
Remotely sensed hyperspectral imaging offers the possibility to collect hundreds of images, at different wavelength channels, for the same area on the surface of the Earth. Hyperspectral images are characterized by their large volume and dimensionality, which makes their processing and storage difficult. As a result, several techniques have been developed in previous years to perform hyperspectral image analysis on high-performance computing architectures. However, the application of cloud computing techniques has not been as widespread. There are many potential advantages in exploiting cloud computing architectures for distributed hyperspectral image analysis. In this paper, we present a cloud implementation (developed using Apache Spark) of the popular K-means algorithm for unsupervised hyperspectral image clustering. The experimental results suggest that cloud architectures allow for the efficient distributed processing of large hyperspectral image data sets.
Article
The classification of hyperspectral images (HSIs) using convolutional neural networks (CNNs) has recently drawn significant attention. However, it is important to address the potential overfitting problems that CNN-based methods suffer when dealing with HSIs. Unlike common natural images, HSIs are essentially three-order tensors which contain two spatial dimensions and one spectral dimension. As a result, exploiting both spatial and spectral information is very important for HSI classification. This paper proposes a new hand-crafted feature extraction method, based on multiscale covariance maps (MCMs), that is specifically aimed at improving the classification of HSIs using CNNs. The proposed method has the following distinctive advantages. First, with the use of covariance maps, the spatial and spectral information of the HSI can be jointly exploited. Each entry in the covariance map stands for the covariance between two different spectral bands within a local spatial window, which can absorb and integrate the two kinds of information (spatial and spectral) in a natural way. Second, by means of our multiscale strategy, each sample can be enhanced with spatial information from different scales, increasing the information conveyed by training samples significantly. To verify the effectiveness of our proposed method, we conduct comprehensive experiments on three widely used hyperspectral data sets, using a classical 2-D CNN (2DCNN) model. Our experimental results demonstrate that the proposed method can indeed increase the robustness of the CNN model. Moreover, the proposed MCMs+2DCNN method exhibits better classification performance than other CNN-based classification strategies and several standard techniques for spectral-spatial classification of HSIs. IEEE
Article
This paper proposes a new distributed architecture for supervised classification of large volumes of earth observation data on a cloud computing environment. The architecture supports distributed execution, network communication, and fault tolerance in a transparent way to the user. The architecture is composed of three abstraction layers, which support the definition and implementation of applications by researchers from different scientific investigation fields. The implementation of architecture is also discussed. A software prototype (available online), which runs machine learning routines implemented on the cloud using the Waikato Environment for Knowledge Analysis (WEKA), a popular free software licensed under the GNU General Public License, is used for validation. Performance issues are addressed through an experimental analysis in which two supervised classifiers available in WEKA were used: random forest and support vector machines. This paper further describes how to include other classification methods in the available software prototype