Conference PaperPDF Available

Automated Analysis of Remotely Sensed Images Using the Unicore Workflow Management System

Authors:

Abstract and Figures

The progress of remote sensing technologies leads to increased supply of high-resolution image data. However, solutions for processing large volumes of data are lagging behind: desktop computers cannot cope anymore with the requirements of macro-scale remote sensing applications; therefore, parallel methods running in High-Performance Computing (HPC) environments are essential. Managing an HPC processing pipeline is non-trivial for a scientist, especially when the computing environment is heterogeneous and the set of tasks has complex dependencies. This paper proposes an end-to-end scientific workflow approach based on the UNICORE workflow management system for automating the full chain of Support Vector Machine (SVM)-based classification of remotely sensed images. The high-level nature of UNICORE workflows allows to deal with heterogeneity of HPC computing environments and offers powerful workflow operations such as needed for parameter sweeps. As a result, the remote sensing workflow of SVM-based classification becomes re-usable across different computing environments, thus increasing usability and reducing efforts for a scientist.
Content may be subject to copyright.
AUTOMATED ANALYSIS OF REMOTELY SENSED IMAGES
USING THE UNICORE WORKFLOW MANAGEMENT SYSTEM
Shahbaz Memon1,2, Gabriele Cavallaro1,Bj
¨
orn Hagemeier1, Morris Riedel1,2, Helmut Neukirchen2
1J¨
ulich Supercomputing Centre, Forschungszentrum J¨
ulich, J¨
ulich, Germany
2School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
ABSTRACT
The progress of remote sensing technologies leads to in-
creased supply of high-resolution image data. However,
solutions for processing large volumes of data are lagging
behind: desktop computers cannot cope anymore with the
requirements of macro-scale remote sensing applications;
therefore, parallel methods running in High-Performance
Computing (HPC) environments are essential. Managing an
HPC processing pipeline is non-trivial for a scientist, espe-
cially when the computing environment is heterogeneous and
the set of tasks has complex dependencies. This paper pro-
poses an end-to-end scientific workflow approach based on
the UNICORE workflow management system for automating
the full chain of Support Vector Machine (SVM)-based clas-
sification of remotely sensed images. The high-level nature
of UNICORE workflows allows to deal with heterogeneity of
HPC computing environments and offers powerful workflow
operations such as needed for parameter sweeps. As a result,
the remote sensing workflow of SVM-based classification
becomes re-usable across different computing environments,
thus increasing usability and reducing efforts for a scientist.
Index TermsRemote Sensing, Support Vector Ma-
chine (SVM), High-Performance Computing (HPC), Scien-
tific Workflows, UNICORE.
1. INTRODUCTION
Due to the advancement of the latest-generation remote sens-
ing instruments a wealth of information, such as spatial,
multi-temporal, physical parameters are generated almost
on a continuous basis and with an increasing rate at global
scale. This sheer volume and variety of sensed data leads
to a necessary re-definition of the challenges within the en-
tire lifecycle of remote sensing data [1]. Trends in parallel
High-Performance Computing (HPC) architectures are in
continuous expansion to attempt the growing demand of
domain-specific applications for handling computationally
intensive problems. In the context of large scale remote sens-
ing applications, where the interpretation of the data is not
Thanks to J¨
ulich Supercomputing Centre (JSC) for funding. This work
was partly supported by NordForsk as part of the Nordic Center of Excellence
(NCoE) eSTICC (eScience Tools for Investigating Climate Change at High
Northern Latitudes). Correspondence: m.memon@fz-juelich.de
straightforward and near real time answers are required, HPC
and Cloud Computing can bear the chance to overcome the
limitations of serial algorithms. Data analysis is a life cycle
of multiple phases, and the classification of images is just one
of the tasks in this realm. To have error free data analysis,
it is imperative to have the tasks managed in a pre-defined
manner and –in accordance with application requirements–
executed on distributed HPC resources. In the simplest case,
all the phases are to be deployed on one HPC resource (e.g.,
cluster), but if they are distributed across different clusters
with different access mechanisms, then it becomes tedious
for users to manage all the processes. Moreover, the data
access also plays a vital role, since the data sets or resul-
tant output requires interaction with data repositories through
heterogeneous data management and file transfer interfaces
and protocols. Besides the execution and data management
requirements, the data analysis pipeline has a set of tasks, that
can be either sequential, concurrent or iterative, thus forming
a workflow.
To realize this scenario, we use tools and techniques
from the area of scientific Workflow Management Systems
(WMS). Several WMS have been implemented to support
physical and data modelling applications, but there are less
cases known where machine learning scenarios, such as
classification methods, have been catered. In our previous
work [2], we proposed to automate only the cross-validation
phase of the Support Vector Machine (SVM) [3], which are
one of the most used classifiers for analyzing and extracting
information from remote sensing data; in this earlier work,
we use a standards-based parameter sweep model and the
HPC middleware UNICORE [4] for the cross-validation.
This paper goes one step further by extending the work-
flow to the model generation (i.e., training) and prediction
(i.e., classification) phases by using UNICORE’s more ad-
vanced workflow management capabilities and it’s graphical
client, the UNICORE Rich Client (URC) [5]. We introduce an
end-to-end workflow design, implementation, execution, and
monitoring through the URC’s visual interface. URC helps
in reducing tremendous amounts of time, development effort,
and makespan to analyze remote sensing data not only for the
classification phase but also within pre-processing steps such
as feature extraction.
PROBLEM
DEFINITION
DATA
ACQUISITION PREPROCESSING PROCESSINGPROCESSING INFORMATION
ABSTRACTION
THEMATIC
APPLICATION
Fig. 1. Stages of a general remote sensing data processing flow. The blocks highlighted in yellow are the considered steps.
2. SCIENTIFIC CASE STUDY
The entire lifecycle of remote sensing data consists of a multi-
step pipeline (see Fig. 1) that includes several data-driven
methods between the acquisition to the application phase:
preprocessing (e.g., geometric and atmospheric corrections),
processing (e.g., feature extraction) and information abstrac-
tion (e.g., classification and clustering).
Classification is one of the essential techniques used for
abstracting information and can serve a wide variety of the-
matic applications, such as the separation of different types of
land-cover classes in order to understand urban development,
mapping, impacts of natural disasters, crop monitoring, track-
ing, risk management, etc. Despite the efficiency and com-
plexity of the selected classification algorithm, the aforemen-
tioned variety and high dimensionality of remote sensing data
(e.g., World-View-3 satellite sensor with 0.31m spatial reso-
lution, AISA Dual airborne sensor with 500 bands) can lead
to computational and statistical challenges related to the pro-
cessing scalability and the effectiveness in extracting knowl-
edge. Problems arise for instance when the classifiers require
fast and highly scalable implementation in real-time appli-
cations (e.g., earthquake scenarios or glacial surges). Tra-
ditional desktop approaches (e.g. MATLAB, R, SAS) have
several limitations due to the fact that big remote sensing data
cannot be stored or processed by algorithms designed for sin-
gle shared-memory machines. As a consequence, algorithms
become able to exploit parallel environments such as HPC
clusters, grids, or clouds which provide a tremendous compu-
tation capacity and outstanding scalability. However, the het-
erogeneity of world of parallel environments complicates the
user experience. Therefore, a comprehensive solution abridg-
ing the gap in knowledge in using the computational resources
is required.
Classification algorithms can greatly benefit from an in-
tegrated framework in which spatial and spectral information
are both included into the analysis. In this study, a two-step
approach to classification is considered where the features
and the predictive model are computed by two separate al-
gorithms (see the highlighted processing and information ab-
straction blocks in Fig. 1) The selected algorithms have been
developed for running on many-core systems, which enables
them to handle large scale datasets [6] and overcome limita-
tions of serial algorithms. In the first step, Morphological At-
tribute Profiles (APs) compute features that characterize the
spatial information within a given image [7, 8]. The APs re-
sult from a sequential application of attribute filters based on
component trees which can be computed a the shared- and
distributed-memory hybrid algorithm proposed by some of
the authors [9]. This implementation outperforms traditional
serial algorithms whose performances are strongly affected by
the size and the quantization of the data. In the second step,
the APs serve as the sequential input to the SVMs [3], which
are adopted as a classifier. Our implementation of an SVM
is based on the parallel implementation πSvM [10] which we
improved [6] to make more efficient use of the Message Pass-
ing Interface (MPI) for parallel processing. This improved
PiSvM [11] offers significant speed-ups for the cross valida-
tion, training and testing steps while maintaining the same
accuracy as achieved when performing the classification with
serial algorithms.
3. WORKFLOW MANAGEMENT
Data analysis of remotely sensed images is composed of
multiple phases which typically require separate applications
to be executed. Automating the data analysis pipeline in a
parallel scientific computing environment without automated
workflow management may be difficult to realize, e.g., job
submission may be different in every environment. To solve
this problem, we use a scientific Workflow Management Sys-
tem to help the user with composing and executing the data
analysis steps in a seamless manner.
Remote sensing users intending to manage complex data
processing through machine learning tools (such as classifi-
cation using SVMs) on remote and distributed environments
face a couple of issues while accessing massively parallel
HPC platforms. As a first step to solve the underlying data
analysis challenges, the following requirements have been
identified: R1) workflow composition: create and combine
tasks which are dependent on each other; R2) task enactment:
remote execution of the tasks as they are defined during
composition; R3) data access: facilitating data access for
each task from different sources and pushing results to data
sinks; R4) manage remote execution: capability of monitor-
ing, holding or resuming running workflow tasks; and R5)
parametric tasks in order to iterate over n-dimensional values
used as input parameters.
Based on the analysis of the above requirements, we took
the steps (essentially, an abstract workflow) depicted in Fig. 1
and implemented it as automated scientific workflow using
the UNICORE [12] workflow management system. UNI-
CORE is based on multi-tier architecture with servers and
clients: the server layer offers a set of web service interfaces
to manage remote workflow job submissions and data access
on a variety of HPC resource management systems, for in-
stance Torque, SLURM and Load Leveler; the client layer
Fig. 2. The classification workflow in the UNICORE Rich
Client.
consist of a GUI called the UNICORE Rich Client (URC)
and a command line interface.
The purpose of the workflow that we created is the clas-
sification of remotely sensed images through using SVMs
based on the steps in Fig. 1. In the course of classification,
there are multiple steps: pre-processing, cross-validation,
model generation and prediction phases. These steps are
implemented using the URC through it’s visual workflow
composer. Fig. 2 shows the classification workflow that we
designed and implemented using the URC visual workbench.
The details of each step of the workflow are as follows:
1. Preprocessing: It is implemented as a bash script job.
Preprocessing uses data already pre-processed through
morphological Attribute Profiles (AP) [8]. This step
mainly creates a global workflow directory. For each
workflow instance, a separate directory is maintained
wherein the output of each task of the workflow that
successfully finishes gets stored and manages the over-
all result of the classification.
2. ParamSelection: This step provides the cross validation
phase in which the main task is to identify the optimal
model parameters C and G that produce the most ac-
curate results. ParamSelection is a parametric step en-
closed in a nested iteration in order to achieve a two di-
mensional For-Each loop. The For-Each loops are im-
plemented through the value-sets feature of the URC. It
contains a set of discrete values, each of which will be
passed to an individual job by the UNICORE workflow
management system. The output of this steps gener-
ates the best combination of C and G parameters and
stores them in the global workflow directory. ParamSe-
lection use heavily compute and data resources, as each
job runs in parallel and use separate input dataset. De-
tails on this selection of the C and G parameters can be
found in our earlier work [2].
3. Train: This step take cares of generating or training the
model based on best C and G parameters produced in
the previous step. The output of this step is the model
file. The Train step is also very compute-intensive and
requires a parallel computing execution environment.
4. Classify: Classification is the final phase and takes two
inputs: the model file generated by the Train step and
an unseen image dataset. The main task of this job is
to classify the unseen data set. At the end, it produces
a unit vector as a text file with classification markers.
Also this step takes a considerable amount of comput-
ing resources.
Since the workflow management is provided by the UNI-
CORE middleware, it is mandatory to have the UNICORE
workflow components deployed. In our case, the UNICORE
workflow management services are hosted on the cluster JU-
RECA [13]. Each of the above workflow steps needs a sepa-
rate executable from the PiSvM application suite which need
to be available on the resource executing that step.
In order to run the data analysis workflow on UNICORE,
the user has to compose the workflow on URC to obtain the
workflow shown in Fig. 1. In addition, the internals of each
step specified in the workflow needs to be implemented as a
shell script. As part of creating a workflow, the user also spec-
ifies for a step what data sets need to be fetched and where the
results will be stored, and on which compute resource a step
runs on. Once the workflow composition and the underlying
steps are configured, the user can simply hit a start button in
the URC to submit it to the workflow management service.
The URC will take care of the management and monitoring
of each step’s submission and execution.
The whole data analysis workflow can be exported as re-
producable workflow by the URC export mechanism. With
this feature, the URC exports a workflow template – in a
machine-readable format, which can be easily reused by any
other user who runs an URC instance. The only variations
that need to be adjusted will be the compute resource selec-
tions, which may differ as a new user may not have access
to the same computing cluster where the initial results were
produced. The data sets are publicly downloadable for the
use case presented in this work; thus, no change to the data
sources are required for reproducing the analysis performed
by our workflow.
For the experiment the dataset used is a processed hyper-
spectral data that is made up of 1417617 pixels (with spatial
resolution 20 m) and 30 features [6]. In this experiment the
workflow was deployed and executed on JURECA [14], on
which we have used 1 compute node with 24 cores for each of
the workflow jobs. Preprocessing took a fraction of a second,
whereas the ParamSelection step ran 25 times, among which
the average processing of the cross-validation phase took 14
minutes. The Train step finished in a minute, and Classify
completed in 1.5 minutes. The execution times have no dif-
ference with the manual configuration if the scripts are well
prepared for JURECA, otherwise the execution and monitor-
ing may have a significant time and usability overhead.
4. CONCLUSIONS
Scientific Workflow Managament Systems (WMS) allow
to abstract away underlying resource management and ap-
plication details thus leading to re-usable, understandable,
portable and maintainable workflows. For remote sensing
image classification using SVMs, so far only the parameter
sweep needed of the cross-validation step was automated us-
ing a WMS [2]. In this paper, we automate the full chain of
classification using SVMs, including training that follows the
cross-validation and the classification itself. We developed
a re-uable end-to-end workflow based on the UNICORE
workflow management system that allows to execute the
workflow in an automated and portable way across heteroge-
neous computing systems. By using the graphical UNICORE
Rich Client, the productivity of scientists gets increased.
Our experience is that the UNICORE middleware reduces
the human efforts in terms of time needed for manual work
and understanding the technology adopted. Furthermore, the
workflow can be easily adapted to operate on different par-
allel computing environments as long as they use other, but
standards-based middleware. As a future work, we intend to
study the impact of our approach in supporting large scale
analysis of remotely sensed data using multiple nodes and
also analyse deep learning methods for classification.
5. REFERENCES
[1] M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen,
and Y. Zhu, “Big Data for Remote Sensing: Challenges
and Opportunities,” Proc. of the IEEE, vol. 104, no. 11,
pp. 2207–2219, 2016.
[2] M. Shahbaz, G. Cavallaro, M. Riedel, and
H. Neukirchen, “Facilitating Efficient Data Analysis
of Remotely Sensed Images Using Standards-Based
Parameter Sweep Models, in Proc. IGARSS, 2017.
[3] C. Cortes and V. Vapnik, “Support-Vector Networks,
Machine Learning, vol. 20(3), pp. 273–297, 1995.
[4] M. Shahbaz Memon, A. Shiraz Memon, M. Riedel,
B. Schuller, D. Mallmann, B. Tweddell, A. Streit,
S. van de Berghe, D. Snelling, V. Li, M. Marzolla, and
P. Andreetto, “Enhanced resource management capa-
bilities using standardized job management and data ac-
cess interfaces within UNICORE Grids, in Int. Conf.
on Parallel and Distributed Systems. 2007, IEEE.
[5] B. Demuth, B. Schuller, S. Holl, J. Daivandy, A. Giesler,
V. Huber, and S. Sild, “The unicore rich client: Facili-
tating the automated execution of scientific workflows,
2013 IEEE 9th Int. Conf. on e-Science, vol. 0, pp. 238–
245, 2010.
[6] G. Cavallaro, M. Riedel, M. Richerzhagen, J.A.
Benediktsson, and A. Plaza, “On understanding big data
impacts in remotely sensed image classification using
support vector machine methods, IEEE J. Sel. Top-
ics Appl. Earth Observ., vol. 8, no. 10, pp. 4634–4646,
2015.
[7] M. Dalla Mura, J. A. Benediktsson, B. Waske, and
L. Bruzzone, “Morphological Attribute Profiles for the
Analysis of Very High Resolution Images,” IEEE Trans.
Geosci. Remote Sens., vol. 48, no. 10, pp. 3747–3762,
2010.
[8] G. Cavallaro, N. Falco, M. Dalla Mura, and J. A.
Benediktsson, Automatic attribute profiles,” IEEE
Trans. Image Process., vol. 26, no. 4, pp. 1859–1872,
2017.
[9] M. G ¨
otz, G. Cavallaro, T. G´
eraud, M. Book, and
M. Riedel, “Parallel Computation of Component Trees
on Distributed Memory Machines, IEEE Trans. Paral-
lel Distrib. Syst., 2017, press.
[10] D. Brugger, πSvM,” website, 2014, http://
pisvm.sourceforge.net.
[11] M. Richerzhagen, “piSvM,” website, 2015, https:
//github.com/mricherzhagen/pisvm.
[12] A. Streit et al., “Unicore 6 recent and future advance-
ments,” Annals of Telecommunications, vol. 65, no. 11-
12, pp. 757–762, 2010.
[13] D. Krause and P. Th¨
ornig, “JURECA: General-purpose
supercomputer at J¨
ulich Supercomputing Centre,” Jour-
nal of large-scale research facilities JLSRF, vol. 2,
March 2016.
[14] J¨
ulich Supercomputing Centre, “JURECA,” web-
site, 2016, http://www.fz-juelich.de/
ias/jsc/EN/Expertise/Supercomputers/
JURECA/JURECA_node.html.
View publication statsView publication stats
... In recent years, artificial intelligence (AI) applications have made amazing advances in many areas such as computer vision, speech and language processing, security and cyber-security [11], business intelligence, and robotics. AI is also used in environmental domain for remote sensing and forecasting [12]. The success of neural networks and the rise of deep learning [13] is believed under the availability of: ...
... Through a proper training, the network can learn how to optimally represent inputs as features at different scales or resolutions and combine them into higherorder feature representations while relating these representations to output variables and so learning how to make predictions. In general neural network can learn how to predict from the inferring function ∶ → , which has a widely used interpretation as in Eq. (12). ...
... • Many of them are narrowed to specific domain-oriented problemsolving solutions and optimizations to assist the given user community in the realization of simulations in high-performance environments; e.g., [40] forecasts the wind speed and power based on the WRF simulation and optimized association methods; [41] assesses WRF models for simulations of extreme rainfall events; [42] has to do with mass imbalances in water-quality simulations. • Besides them, universal workflow management systems (WMSs) have been also developed to automate the execution of diverse workflows on a variety of computing platforms; e.g., Tigres [43], Galaxy [44], Unicore [12] They mainly handle workflow execution models in heterogeneous computing environments with data access in a monolithic way, which is quite restricting [45,46]. ...
Article
The study presented in this work deals with environmental and compute-intensive applications and their operational management aimed to achieve maximum effectiveness on high-performance systems. These applications, mainly compute-intensive modeling and simulations, are often required to deliver accurate forecasts having far-reaching effects on many facets of our society. The extreme growing amounts of observational data, along with remarkable advances in computing power and technology significantly improve the ability to model and simulate challenging issues. However, it is well known that designing, implementing as well as deploying such high-fidelity applications intended for running on compute systems is still time-consuming and challenging work for domain experts. In this study, we show that it is a systematic process requiring an expertise and skills in the scientific domain of interest, mathematics, and computer science. The study outcomes present and analyze successful benefits of high computational power for compute-intensive applications especially in the environmental domain.
... As the UNICORE workflow management system is independent from any scientific domain, we encourage other disciplines to transform and automate their application scenarios into automated workflows. We have already successfully demonstrated this for applications in remote sensing (Memon et al., 2018a) and for the interpretation of analytical ultracentrifugation experiments (Memon et al., 2013a). ...
Article
Full-text available
Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.
... As the UNICORE workflow management system is independent from any scientific domain, we encourage other disciplines to transform and automate their e-Science steps into automated workflows. We have successfully demonstrated this 10 already for applications in remote sensing (Memon et al., 2018) and for interpretation of analytical ultracentrifugation experiments (Memon et al., 2013a). To access and run the workflow, the UNICORE sites and the workflow services, and the application packages and the coupling scripts have to be installed. ...
Article
Full-text available
Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on distributed and heterogeneous computing resources find it cumbersome to manage and monitor the execution of these tasks. Scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows across distributed and heterogeneous computing environments. As a case study, we apply the UNICORE workflow management system to a formerly hard-coded coupling of a glacier sliding and calving simulation that contains many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, re-usability, portability, and reproducibility in different environments and by different user groups.
Article
Full-text available
Component trees are region-based representations that encode the inclusion relationship of the threshold sets of an image. These representations are one of the most promising strategies for the analysis and the interpretation of spatial information of complex scenes as they allow the simple and efficient implementation of connected filters. This work proposes a new efficient hybrid algorithm for the parallel computation of two particular component trees—the max- and min-tree—in shared and distributed memory environments. For the node-local computation a modified version of the flooding-based algorithm of Salembier is employed. A novel tuple-based merging scheme allows to merge the acquired partial images into a globally correct view. Using the proposed approach a speed-up of up to 44.88 using 128 processing cores on eight-bit gray-scale images could be achieved. This is more than a five-fold increase over the state-of-the-art shared-memory algorithm, while also requiring only one-thirty-second of the memory.
Article
Full-text available
Morphological attribute profiles are multilevel decompositions of images obtained with a sequence of transformations performed by connected operators. They have been extensively employed in performing multi-scale and region-based analysis in a large number of applications. One main, still unresolved, issue is the selection of filter parameters able to provide representative and non-redundant threshold decomposition of the image. This paper presents a framework for the automatic selection of filter thresholds based on Granulometric Characteristic Functions (GCFs). GCFs describe the way that non-linear morphological filters simplify a scene according to a given measure. Since attribute filters rely on a hierarchical representation of an image (e.g., the Tree of Shapes) for their implementation, GCFs can be efficiently computed by taking advantage of the tree representation. Eventually, the study of the GCFs allows the identification of a meaningful set of thresholds. Therefore, a trial and error approach is not necessary for the threshold selection, automating the process and in turn decreasing the computational time. It is shown that the redundant information is reduced within the resulting profiles (a problem of high occurrence, as regards manual selection). The proposed approach is tested on two real remote sensing data sets, and the classification results are compared with strategies present in the literature.
Article
Full-text available
JURECA is a petaflop-scale, general-purpose supercomputer operated by Jülich Supercomputing Centre at Forschungszentrum Jülich. Utilizing a flexible cluster architecture based on T-Platforms V-Class blades and a balanced selection of best of its kind components the system supports a wide variety of high-performance computing and data analytics workloads and offers a low entrance barrier for new users.
Article
Full-text available
In the last three years activities in Grid computing have changed; in particular in Europe the focus moved from pure research-oriented work on concepts, architectures, interfaces, and protocols towards activities driven by the usage of Grid technologies in day-to-day operation of e-infrastructure and in applicationdriven use cases. This change is also reected in the UNICORE activities [1]. The basic components and services have been established, and now the focus is increasingly on enhancement with higher level services, integration of upcoming standards, deployment in e-infrastructures, setup of interoperability use cases and integration of applications. The development of UNICORE started back more than 10 years ago, when in 1996 users, supercomputer centres and vendors were discussing "what prevents the efficient use of distributed supercomputers?". The result of this discussion was a consensus which still guides UNICORE today: seamless, secure and intuitive access to distributed resources. Since the end of 2002 continuous development of UNICORE took place in several EU-funded projects, with the subsequent broadening of the UNICORE community to participants from across Europe. In 2004 the UNICORE software became open source and since then UNICORE is developed within the open source developer community. Publishing UNICORE as open source under BSD license has promoted a major uptake in the community with contributions from multiple organisations. Today the developer community includes developers from Germany, Poland, Italy, UK, Russia and other countries. The structure of the paper is as follows. In Section 2 the architecture of UNICORE 6 as well as implemented standards are described, while Section 3 focusses on its clients. Section 4 covers recent developments and advancements of UNICORE 6, while in section 5 an outlook on future planned developments is given. The paper closes with a conclusion.
Article
Full-text available
Morphological attribute profiles (APs) are defined as a generalization of the recently proposed morphological profiles (MPs). APs provide a multilevel characterization of an image created by the sequential application of morphological attribute filters that can be used to model different kinds of the structural information. According to the type of the attributes considered in the morphological attribute transformation, different parametric features can be modeled. The generation of APs, thanks to an efficient implementation, strongly reduces the computational load required for the computation of conventional MPs. Moreover, the characterization of the image with different attributes leads to a more complete description of the scene and to a more accurate modeling of the spatial information than with the use of conventional morphological filters based on a predefined structuring element. Here, the features extracted by the proposed operators were used for the classification of two very high resolution panchromatic images acquired by Quickbird on the city of Trento, Italy. The experimental analysis proved the usefulness of APs in modeling the spatial information present in the images. The classification maps obtained by considering different APs result in a better description of the scene (both in terms of thematic and geometric accuracy) than those obtained with an MP.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Every day a large number of Earth observation (EO) spaceborne and airborne sensors from many different countries provide a massive amount of remotely sensed data. Those data are used for different applications, such as natural hazard monitoring, global climate change, urban planning, etc. The applications are data driven and mostly interdisciplinary. Based on this it can truly be stated that we are now living in the age of big remote sensing data. Furthermore, these data are becoming an economic asset and a new important resource in many applications. In this paper, we specifically analyze the challenges and opportunities that big data bring in the context of remote sensing applications. Our focus is to analyze what exactly does big data mean in remote sensing applications and how can big data provide added value in this context. Furthermore, this paper describes the most challenging issues in managing, processing, and efficient exploitation of big data for remote sensing problems. In order to illustrate the aforementioned aspects, two case studies discussing the use of big data in remote sensing are demonstrated. In the first test case, big data are used to automatically detect marine oil spills using a large archive of remote sensing data. In the second test case, content-based information retrieval is performed using high-performance computing (HPC) to extract information from a large database of remote sensing images, collected after the terrorist attack to the World Trade Center in New York City. Both cases are used to illustrate the significant challenges and opportunities brought by the use of big data in remote sensing applications.
Conference Paper
Today, many scientific disciplines heavily rely on computer systems for in-silico experimentation or data management and analysis. The employed computer hard- and software is heterogeneous and complies to different standards, interfaces and protocols for interoperation. Grid middleware systems like UNICORE 6 try to hide some of the complexity of the underlying systems by offering high-level, uniform interfaces for executing computational jobs or storing, moving, and searching through data. Via UNICORE 6 computer resources can be accessed securely with different software clients, e.g. the UNICORE Command line Client (UCC) or the graphical UNICORE Rich Client (URC) which is based on Eclipse. In this paper, we describe the design and features of the URC, and highlight its role as a flexible and extensible Grid client framework using the QSAR field as an example.