Conference PaperPDF Available

Improvements of the UltraScan Scientific Gateway to Enable Computational Jobs on Large-scale and Open-standards Based Cyberinfrastructures

Authors:

Abstract and Figures

The UltraScan data analysis application is a software package that is able to take advantage of computational resources in order to support the interpretation of analytical ultracentrifugation (AUC) experiments. Since 2006, the UltraScan scientific gateway has been used with ordinary Web browsers in TeraGrid by scientists studying the solution properties of biological and synthetic molecules. Unlike other applications, UltraScan is implemented on a gateway architecture and leverages the power of supercomputing to extract very high resolution information from the experimental data. In this contribution, we will focus on several improvements of the UltraScan scientific gateway that enable a standardized job submission and management to computational resources while retaining its lightweight design in order to not disturb the established workflows of its end-users. This paper further presents a walkthrough of the architectural design including one real installation deployment of UltraScan in Europe. The aim is to provide evidence for the added value of open standards and resulting interoperability enabling not only UltraScan application submissions to resources offered in the US cyber infrastructure Extreme Science and Engineering Discovery Environment (XSEDE), but also submissions to similar infrastructures in Europe and around the world. The use of the Apache Airavata framework for scientific gateways within our approach bears the potential to have an impact on several other scientific gateways too.
Content may be subject to copyright.
Improvements of the UltraScan Scientific Gateway to
Enable Computational Jobs on Large-scale and
Open-standards based Cyberinfrastructures
Shahbaz Memon
Juelich Supercomputing Centre
Forschungszentrum Juelich
D-52425, Juelich, Germany
m.memon@fz-juelich.de
Norbert Attig
Juelich Supercomputing Centre
Forschungszentrum Juelich
D-52425, Juelich, Germany
n.attig@fz-juelich.de
Gary Gorbet
The University of Texas Health
Science Center at San Antonio
San Antonio, Texas, USA
gegorbet@gmail.com
Lahiru Gunathilake
Indiana University Bloomington
School Of Informatics and Computing
Bloomington, Indiana, USA
lginnali@indiana.edu
Morris Riedel
Juelich Supercomputing Centre
Forschungszentrum Juelich
D-52425, Juelich, Germany
m.riedel@fz-juelich.de
Thomas Lippert
Juelich Supercomputing Centre
Forschungszentrum Juelich
D-52425, Juelich, Germany
th.lippert@fz-juelich.de
Suresh Marru
Indiana University Bloomington
School Of Informatics and Computing
Bloomington, Indiana, USA
smarru@iu.edu
Andrew Grimshaw
University of Virginia
Department of Computer Science
Charlottesville, Virginia, USA
grimshaw@virginia.edu
Florian Janetzko
Juelich Supercomputing Centre
Forschungszentrum Juelich
D-52425, Juelich, Germany
f.janetzko@fz-juelich.de
Borries Demeler
The University of Texas Health
Science Center at San Antonio
San Antonio, Texas, USA
demeler@biochem.uthscsa.edu
Raminder Singh
Indiana University Bloomington
School Of Informatics and Computing
Bloomington, Indiana, USA
ramifnu@iu.edu
Morris Riedel
School of Engineering and
Natural Sciences
University of Iceland Reykjavik,
Iceland
m.riedel@fz-juelich.de
ABSTRACT
The UltraScan data analysis application is a software package that
is able to take advantage of computational resources in order to
support the interpretation of analytical ultracentrifugation (AUC)
experiments. Since 2006, the UltraScan scientific gateway has
been used with ordinary Web browsers in TeraGrid by scientists
studying the solution properties of biological and synthetic
molecules. Unlike other applications, UltraScan is implemented
on a gateway architecture and leverages the power of
supercomputing to extract very high resolution information from
the experimental data. In this contribution, we will focus on
several improvements of the UltraScan scientific gateway that
enable a standardized job submission and management to
computational resources while retaining its lightweight design in
order to not disturb the established workflows of its end-users.
This paper further presents a walkthrough of the architectural
design including one real installation deployment of UltraScan in
Europe. The aim is to provide evidence for the added value of
open standards and resulting interoperability enabling not only
UltraScan application submissions to resources offered in the US
cyber infrastructure Extreme Science and Engineering Discovery
Environment (XSEDE), but also submissions to similar
infrastructures in Europe and around the world. The use of the
Apache Airavata framework for scientific gateways within our
approach bears the potential to have an impact on several other
scientific gateways too.
Categories and Subject Descriptors
C.2.4 – Grid Computing; D.2.11 - Service-oriented architecture
(SOA);
General Terms
Design, Reliability, Experimentation, Security, Standardization
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
XSEDE’13, July 22–25, 2013, San Diego, CA, USA
Copyright 2013 ACM 978-1-4503-2170-9/13/07 …$15.00.
Keywords
Scientific Gateways, UltraScan, UNICORE, XSEDE, Apache
Airavata
1. INTRODUCTION
Scientific gateways have emerged as a lightweight access layer on
top of rather complex middleware setups, security paradigms, and
heterogeneous computational resources. They offer end-users a
simple access allowing for focusing on the science in question
instead on rather technical low-level configurations and details.
They have become an indispensable tool for domain-specific
scientists and enables various benefits such as minimizing the
amount of errors from inputs by checking input parameters, or by
just offering those combinations of input configuration parameters
and their relationships that actually make sense from the science
perspective. Therefore, scientific gateways have been
implemented in a broad range of scientific disciplines. The focus
of our contribution is a scientific gateway known as the UltraScan
Laboratory Information Management System (US-LIMS), which
is used primarily in biochemistry, material science and polymer
science. Its underlying UltraScan data analysis application [1, 2]
is a software package that is able to take advantage of
computational resources in order to support the interpretation of
analytical ultracentrifugation (AUC) experiments. This scientific
gateway was already used in TeraGrid by scientists studying the
solution properties of biological and synthetic molecules taking
advantage of one specific underlying middleware known as
Globus [11].
At the same time, we observe an increase of middleware systems
that offer open standard interfaces (e.g. UNICORE [17],
GENESIS [18], GridSAM [19]), because of two major reasons.
First, open standard interfaces avoid vendor-locks, thus offering
resource providers the possibility to change their systems, if
needed, without requiring a change in the end-user open standard-
based client setup. Second, the open standards in the distributed
computing domain have become extremely stable in the last
couple of years after already having been intensively used in
scientific domains, which increases the trust in adopting
technology by orders of magnitude.
UltraScan is implemented on a gateway architecture and
leverages the power of supercomputing to extract very high
resolution information from the experimental data. In this paper,
we will focus on several improvements of the UltraScan scientific
gateway that enable a standardized job management and
submission to computational resources while retaining its
lightweight design in order to not disturb the established
workflows of its end-users. The current US-LIMS gateway, which
is used to access remote computational resources with these
interfaces, has been in constant use since 2006 by several hundred
users world-wide, with computational resources available from
TeraGrid, XSEDE, Universities in the US and Australia, as well
as one commercial site in the US. This system is stable and
mature and undergoes only very minor changes in the user
interface.
This paper is structured as follows. After the introduction in
Section 1, the rationale behind our work is given in Section 2 with
the scientific case underlying UltraScan providing also the
relevance of scientific gateways in context. Section 3 lists the
identified limitations of the UltraScan scientific gateway and its
implied framework for access to computational resources known
as Apache Airavata. Section 4 then describes how we overcome
the identified limitations of the framework by using open
standards, which are also briefly introduced in context. Section 5
provides one concrete deployment of our proposed integrated
architecture offering a step-wise walkthrough with the end-user
perspective. After surveying related work in the field in Section 6,
this paper ends with some concluding remarks.
2. THE SCIENTIFIC CASE
Since the early 1990s, digital data acquisition from the analytical
ultracentrifuge laid the foundation to analyze sedimentation data
using computational resources. The UltraScan data analysis
application [1, 2] became a well-known multi-platform software
package that is able to take advantage of high-performance
computing resources in order to support the interpretation of
complex, high-resolution analytical ultracentrifugation (AUC)
experiments. UltraScan not only provides guidance for the design
of sedimentation experiments, but also addresses data
management challenges that arise from the wealth of AUC data.
For example, the US-LIMS includes a relational database which
serves the user with a common interface for collaboration,
analysis job submission to a remote high-performance computing
platform, and to manage all data related to AUC analysis.
Interpretation and visualization of analysis results occurs with a
multi-platform graphical user interface.
A web-based UltraScan scientific gateway [3, 4] was developed
to hide the arcane tedium of submitting supercomputer jobs from
the ordinary user and make the UltraScan software package more
user-friendly and its data analysis methods available to a broader
community. This opened up the utility of the software to new and
less-experienced scientific users. The UltraScan scientific
gateway is used across the world with the majority of users in the
US and Europe.
Figure 1. The US-LIMS scientific gateway architecture
including its different backend entitites and relationships.
The current gateway architecture is illustrated in Figure 1.
Currently, over 70 institutions are using the Ultrascan software
through the UltraScan scientific gateway. Last year alone, over
100 investigators studied AUC data using the UltraScan gateway,
using an aggregate of 8.7 million service units (core hours). The
LIMS interface supports several supercomputer-based analysis
methods for studying AUC data: 1. a 2-dimensional spectrum
analysis [5, 6], which provides distributions of shape and size for
mixtures of different molecules, and removes time- and radially-
invariant noise sources from the data. A further refinement is
obtained by using the genetic algorithm analysis [7], which
achieves a parsimonious regularization of the solution space [8].
A statistical analysis of the final model is achieved with a Monte
Carlo analysis [9], which can also be performed on a remote
supercomputer.
3. IDENTIFIED LIMITATIONS
The latest US-LIMS gateway uses Apache Airavata [10] software
framework to abstract access to computational resources. As
shown in Figure 1, the Generic Application Factory (GFAC) tools
within Airavata, wraps command line-driven science applications
and turns them into robust, network-accessible services. Airavata
has a pluggable architecture that supports submission to different
middleware layers. Currently UltraScan jobs are managed through
Globus middleware [11], customized access through SSH to
remote machines, Amazon EC2, Hadoop based clusters using
Apache Whirr abstraction. Thus the access seems to be mostly
based on proprietary protocols to a broad set of different
technologies that provide access to computational resources.
But a deeper analysis of the Apache Airavata software reveals
that it is designed on service oriented architecture principles,
which support abstraction, extension, and component
encapsulation within a distributed system. As illustrated in the
Figure 2, different plugins (e.g. Java CoG, EC2 API, etc.) can be
added to Airavata to authenticate, authorize, move data, submit
jobs and monitor progress just to list a few. Apache Airavata
provides science gateway capabilities through components via
GFAC, XBaya Workflow Suite, a drag and drop graphical
interface and an interpreted workflow enactment server,
information and data registry, and the WS-Messenger
publish/subscribe-based messaging system. Each of these
modules is distributed as a bundled server, GUI and a client API.
Figure 2. Overview of the abstraction from the UltraScan
Gateway to the underlying computational resources via a
wide variety of Apache Airavata plugins.
In more detail, Apache Airavata GFAC provides a generic
framework to wrap an application as a service interface in Java.
This service layer separates the invocation from the
communication layer supporting multiple protocols like SOAP,
REST, or JSON. Thus GFAC can generate a SOAP, REST or
native Java interface to any command line application. Currently,
Apache Airavata focuses on the Axis2 and core java
implementations, but is well designed to make it possible to
incorporate the GFAC-Core into any service frameworks. The
application provider first describes the application through input,
outputs, deployment information, temporary working directories,
and remote access mechanisms for file transfers and job
submissions, and registers this information with a registry service.
Once applications are registered, GFAC distributed application
management handles the file staging, job submission, and security
protocols associated with executions. Furthermore, the framework
provides a set of extensible interfaces which can be useful to
enhance its essential capabilities such as resource sharing,
auditing and resource scheduling.
During execution, the application schedule is determined and the
input data files specified by input parameters are staged to the
computational resource. The underlying application is then
executed using a job submission mechanism. Currently, grid,
cloud, SSH and local submissions are supported. The framework
monitors the status of the remote application, and periodically
publishes activity information to the event bus. Once the
invocation is complete, the application service tries to determine
the results of the application invocation by searching the standard
output for user-defined patterns or by listing a pre-specified
location for generated data products. The application service
runtime is implemented using a processing pipeline based on the
Chain of Responsibility pattern, where the pipeline can be altered
by inserting interceptors. The resulting architecture is highly
flexible and extensible, and provides the ideal architectural basis
for a system that supports a wide range of requirements.
Figure 3. Overview of the key limitations by just providing
proprietary protocols instead of open standard protocols.
Figure 3 summarizes the current architecture and illustrates the
advantages of having open standards based providers. The
additional standard based client integration opens doors for use of
additional resources and avoids a vendor-lock. We integrated a
standards based client as opposed to yet another abstraction for
the Grid middleware UNICORE [17] or GENESIS [18]. This is
particularly important to keep up with emerging architectures in
e-Science infrastructures such as XSEDE.
Figure 3 also outlines a potential solution to overcome the
limitations by adding a standardized protocol space to the Apache
Airavata framework. This brings benefit such as avoiding vendor-
locks (e.g. Globus) being forced to use only systems that have one
particular middleware installed. Hence, apart from saving
maintenance efforts by just providing one open standards adapter,
the adoption of the open standards protocols also provide access
to a broader range and wide variety of computational resources
offered by different middleware.
4. ARCHITECTURAL
IMPROVEMENTS
This section provides an introduction to the relevant open
standard protocols used to improve the architecture analyzed in
Section 3, and describes our architectural improvements.
4.1 Relevant Open Standard Protocols
There are two relevant open standards in the field that are of
relevance to the Apache Airavata in order to facilitate the
standard based extensions discussion in Section 3. The two open
standards are the Job Submission and Description Language
(JSDL) [12] and the Open Grid Services Architecture – Basic
Execution Service (OGSA-BES) [13] specification, often named
as BES. Both standards are well defined and although being very
basic in nature they are extremely mature and stable.
JSDL is a standardized schema specifying clear syntax and
semantics for describing the submission of scientific applications
on computational resources. BES is a well-defined focused
interface that enables the submission, control, and management of
computational jobs (i.e. executing scientific applications) within
middleware systems. Applications invoked via the BES interface
are described with the JSDL specification and contains several
profiles such as the Single-Program-Multiple-Data (SPMD) JSDL
extension [14] and the JSDL parameter sweep extension [15].
4.2 Integrated Architecture
Figure 4 illustrates the adoption of the aforementioned open
standard protocols in the integrated architecture that includes the
UltraScan scientific gateway and the Apache Airavata framework.
It illustrates the standardized access via Grid middleware to
computational resources (e.g. HPC resources) in production on e-
Science infrastructures such as XSEDE, and includes the use of
MyProxy [16] for credential management. The integrated
architecture overcomes the limitations identified in the previous
section by stable standard protocols that do not change often.
The benefits of this integrated architecture are twofold. First, the
BES/JSDL Client Classes Plugin of GFAC enables the
standardized job submission to any standard-compliant
middleware that adopts these standards. Grid middleware that
offer an OGSA-BES (including JSDL) compliant middleware
include UNICORE, GENESIS, or GridSAM [19]. They enable a
stable interface for job submission and management, independent
of the underlying concrete Grid middleware systems, which, can
be exchanged, when needed, while the Airavata framework
implementation stays the same.
The second benefit for the Airavata framework as a whole is the
lowered maintenance cost by just having one open standard-based
plugin to maintain instead of n proprietary plugins. This is in
particular crucial in e-Science production infrastructure setups
where n plugins could be required to be in different versions.
Figure 4. Adoption of Open Standards in the overall UltraScan integrated architecture.
Figure 5. Open standard-based integrated architecture in a concrete deployment setup with a European HPC system.
5. CONCRETE DEPLOYMENT
While the aforementioned section described our approach of the
integrated architecture, this section will focus on one specific
deployment of the architecture in order to illustrate its feasibility
and benefits by using open standards. A step-wise walkthrough
will be given describing step-by-step the different elements of the
architecture in context of a real deployment in Europe. Figure 5
illustrates the integrated architecture with its various steps that
users will employ in order to take advantage of the newly
described open standards. Step (0) indicates that only once the
end-user needs to bind his personal identity with the credential
service within the Apache Airavata system using the X.509
certificate of the end-user. Hence, on a daily operation, the end-
user will start with step (1) using his username and password to
authenticate at the UltraScan3 scientific gateway (local login). In
step (2), the end-user configures the corresponding computational
job with a choice of instrument data inputs and analysis
parameters in the US-LIMS. In step (3), the Apache Airavata API
is used to communicate the job to the framework while its
workflow interpreter in step (4) analyzes the job request and
forwards it to the GFAC element.
Step (5) within GFAC obtains the right credential from the
credential service that in turn retrieves in step (6) the previously
created ID (i.e. via binding) by using the OAuth protocol interface
of the MyProxy server at NCSA.
Step (7) takes advantage of our solutions that overcome the
limitations identified in Section 3. We implemented an open-
standard compliant plugin within Apache Airavata using BES and
implied JSDL as protocol and job description schema. Although
UNICORE Client Classes have been used for the implementation
in this proof-of-concept implementation, any BES/JSDL client
could have been used (e.g. GENESIS BES client for example).
Step (8) authenticates the job request based on the provided
identity at the UNICORE gateway. Hence, UNICORE is the Grid
middleware used in this particular deployment example, because
the computational resource offers it via open standard protocols
such as BES/JSDL. The UNICORE gateway however is
transparent to protocols and just forwards the job request in step
(9) to the UNICORE/X core middleware system that provides an
implementation of BES and implied JSDL. It is important to
understand that in this context, any other BES/JSDL compliant
middleware could have been used instead of UNICORE, for
example, GENESIS or GridSAM. In our setup, the UNICORE/X
system parses the JSDL and performs a mapping to an internal
representation of the described job application.
Step (10) then authorizes the job request also based on the identity
by using the Extensible UNICORE User Data Base (XUUDB)
[17] LDAP Gridmap file. After authorization is granted, the
UNICORE/X forwards in step (11) the job request to the
UNICORE Target System Interface (TSI) [17]. The TSI interacts
with the resource management system (RMS) such as Torque
with respect to job execution. The UltraScan application is pre-
configured in a module and used in step (12) in order to execute
the job on the computational job controlled via the RMS system.
Input data transfers are performed via GridFTP as needed.
6. RELATED WORK
There is a wide variety of related work while we focus on those
that are of direct relevance to a similar integrated architecture
approach we follow in Section 4.
First, the Vine Toolkit [20] provides adapters for various
middleware technologies and is based on Java, representing a
high-level API for managing jobs on specific target sites. The
environments in which Vine can be deployed are Java Web Start,
Java Servlet 2.3, and Java Portlet 1.0. It is optimized to run in
browser-based setups that act as the client tool for the GridSphere
portal. The Vine version supports gLite and UNICORE (e.g. SSL
key generation, and job-based management, etc.), but also VOMS
(e.g. proxies, registering and un-registering users, etc.).
In order to send jobs to a middleware, the Vine Toolkit has client
classes for each middleware. As part of a production installation,
it has to be deployed as a WS application in a Web container. The
URI of this container and the corresponding gateway portal is
then a fixed location that end-users can use to perform their daily
scientific application runs. Some features include the proprietary
UNICORE interface for job management that queries up to date
job statuses, and performs VOMS and proxy generation.
In the given context of this paper the ’Simple API for Grid
Applications (SAGA) Framework’ [21] represents another
concrete example of related work. SAGA stands for two aspects.
Firstly, SAGA is an open standard [22] to promote ’Grid
interoperability on the application level’ developed by the OGF.
Secondly, the so-called ’SAGA framework’ is actively developed
by a team that adopts the SAGA standard as described in [21].
This framework is similar to the Airavata framework and enables
a high-level programming abstraction, which significantly
facilitates the development and deployment of e-Science
applications. As such it provides a lot of application patterns (e.g.
map-reduce, replication, parameter studies, etc. [23]) that are
useful for end-users, thus lowering the barrier in using production
infrastructures in general.
Apart from being a high-level application framework, [23] reveals
that it can work on top of the open standards used in this
contribution (i.e. OGSA-BES, JSDL, etc.). Although being
previously bounded to proprietary interfaces of middleware
systems as described in [21], in [23], the SAGA framework is thus
extended towards OGSA-BES. The idea of SAGA is to expose
the same functionality as standards like OGSA-BES or JSDL, but
to provide additional program level simplifications and
abstractions to the end-users in order to hide the complexity of
these underlying interfaces [23]. Hence, SAGA is not only a
standard, but also an application framework that provides many
features that are beneficial to be included in the Apache Airavata
framework, but have been left out, because of the lack of
development power that is needed to augment Apache Airavata
with SAGA.
Finally, closely related to scientific gateway and internally used
frameworks to access middleware is the P-Grade Portal [24]. This
portal can also be considered as a building block for various
scientific gateways and consists of many middleware adapters. P-
Grade was not used since UltraScan already takes advantage of
the Apache Airavata framework for accessing middleware.
7. CONCLUSIONS
The results of our pilot activities of enabling scientific gateways
with open standard interfaces allows for a couple of interesting
conclusions. We can conclude that using an abstraction
framework within the UltraScan scientific gateway is extremely
useful for two reasons. Firstly, it enables the UltraScan gateway
to focus on their domain-specific functionality (i.e. US-LIMS,
etc.). In other words, the source-code to access the wide variety of
heterogeneous computational resources and middleware systems
does not need to be directly included in the scientific gateway.
Secondly, the Apache Airavata plugin for open standards is
available through apache open community. As it is developed
through open source process, thus it can be re-used within a wide
variety of other scientific gateways. Again, this lowers the
amount of maintenance required by potentially many scientific
gateways that all need to perform job submissions and
management activities. In summary, our work bears the potential
that the standard-based plugin within Apache Airavata will be re-
used many times by being integrated in various scientific
gateways. A major conclusion we can derive from our approach is
the benefit of using of open standard interfaces to access
computational resources (e.g. such as JUROPA [25] in Section 5).
Although we have shown one particular deployment in Europe
based on the UNICORE middleware, the approach enables the
access to a wide variety of other resources that use standard-
compliant middleware such as GENESIS and GridSAM. To
provide an example, any GENESIS installation on the FutureGrid
testbed can be easily used by just changing the URI within the
BES/JSDL client classes. This requires neither re-development of
the plugin within Apache Airavata nor a tuning of the GENESIS
BES/JSDL implementation. This represents a key benefit of using
a standardized approach. Furthermore, we can conclude that even
if some middleware system will not be available in the future
(perhaps due to lack of funding), the use of standardized
interfaces will enable an easy switch to another middleware
system that adopts the standard interface.
The enhancements discussed in this paper have been integrated
with development versions of UltraScan gateway. At the time of
this writing, the services are undergoing production ready testing.
Starting July 1
st
2013, users of UltraScan gateway will benefit
from these contributions and use computational resources in
Juelich in addition to XSEDE and local resources.
Finally, we conclude that our approach lowers risk and increases
trust since it is based on standard interfaces that do not change as
often as proprietary interfaces over time. Especially for large-
scale infrastructures such as XSEDE, the Partnership for
Advanced Computing in Europe (PRACE), or the European Grid
Infrastructure (EGI) it is important that the interface do not
change. In the past, we observed that many proprietary interfaces
changed their protocols often, which leads to less trust within the
Grid middleware systems in particular and the overall
infrastructures in general. Also, the change process of switching
from one middleware version to another version is considerably
slow in several infrastructures. That means that old versions of the
interfaces within middleware systems are still supported in order
to support end-users that are not able to change their client setup
or that are faced with resource providers that also do not want to
frequently re-install their middleware software. In this context,
the use of stable standards reduces this change.
8. ACKNOWLEDGEMENTS
This work in this paper is partially supported by the Extreme
Science and Engineering Discovery (XSEDE) Project of the
National Science Foundation (NSF) grant number OCI-1053575.
9. REFERENCES
[1] Demeler, B. (2005) UltraScan: A Comprehensive Data
Analysis Software Package for Analytical
Ultracentrifugation Experiments In Modern Analytical
Ultracentrifugation: Techniques and Methods (Scott, D.,
Harding, S. and Rowe, A., Eds.) Royal Society of
Chemistry, Cambridge, U.K. 210-229
[2] Demeler, B., Gorbet, G, Zollars, D, Dubbs, B. (2013)
UltraScan-III version 2.0: A comprehensive data analysis
software package for analytical ultracentrifugation
experiments. http://www.utrascan3.uthscsa.edu/.
[3] Brookes E, Demeler B. Parallel computational techniques for
the analysis of sedimentation velocity experiments in
UltraScan. Colloid Polym Sci (2008) 286(2) 138-148
[4] Demeler, B., Singh, R., Pierce, M., Brookes, E. H., Marru,
S., and Dubbs, B. 2011. UltraScan gateway enhancements.
In Proceedings of the 2011 Teragrid Conference: Extreme
Digital Discovery (Salt Lake City, Utah, July 18 - 21, 2011).
TG '11. ACM, New York, NY, 1-8. DOI=
http://doi.acm.org/10.1145/2016741.2016778
[5] Brookes E, Boppana RV, Demeler B. (2006) Computing
Large Sparse Multivariate Optimization Problems with an
Application in Biophysics. Supercomputing '06 ACM 0-
7695-2700-0/06
[6] Brookes E, Cao W, Demeler B A two-dimensional spectrum
analysis for sedimentation velocity experiments of mixtures
with heterogeneity in molecular weight and shape. Eur
Biophys J. (2010) 39(3):405-14.
[7] Brookes E, and Demeler B. Genetic Algorithm Optimization
for obtaining accurate Molecular Weight Distributions from
Sedimentation Velocity Experiments. Analytical
Ultracentrifugation VIII, Progr. Colloid Polym. Sci. 131:78-
82. C. Wandrey and H. Cölfen, Eds. Springer (2006)
[8] Brookes E, Demeler B. Parsimonious Regularization using
Genetic Algorithms Applied to the Analysis of Analytical
Ultracentrifugation Experiments. GECCO Proceedings ACM
978-1-59593-697-4/07/0007 (2007)
[9] Demeler B and E. Brookes. Monte Carlo analysis of
sedimentation experiments. Colloid Polym Sci (2008) 286(2)
129-137
[10] Marru, Suresh, et al. "Apache airavata: a framework for
distributed applications and computational workflows."
Proceedings of the 2011 ACM workshop on Gateway
computing environments. ACM, 2011.
[11] Foster, Ian, and Carl Kesselman. "Globus: A metacomputing
infrastructure toolkit." International Journal of High
Performance Computing Applications 11.2 (1997): 115-128.
[12] A. Anjomshoaa, F. Brisard, M. Drescher, D. Fellows, A. Ly,
S. McGough, D. Pulsipher, and A. Savva. Job Submission
Description Language (JSDL) Specification Version 1.0.
Open Grid Forum, Grid Final Document Nr. 136, 2008.
[13] I. Foster, A. Grimshaw, P. Lane, W. Lee, M. Morgan, S.
Newhouse, S. Pickles, D. Pulsipher, C. Smith, and M.
Theimer. OGSA Basic Execution Service Version 1.0. Open
Grid Forum, Grid Final Document Nr. 108, 2007.
[14] A. Savva. JSDL SPMD Application Extension, Version 1.0.
Open Grid Forum, Grid Final Document Nr. 115, 2007.
[15] M. Drescher, A. Anjomshoaa, G.Williams, and D. Meredith.
JSDL - Parameter Sweep Job Extension. Open Grid Forum,
Grid Final Document Nr. 149, 2009.
[16] J. Novotny, S. Tuecke, V. Welch, ‘An online credential
repository for the Grid: MyProxy’, in Proceedings of 10th
IEEE international Symposium on High Performance
Distributed Computing, 2001
[17] A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, S.
Bergmann, R. Breu, J. Daivandy, B. Demuth, A. Eifer, A.
Giesler, B. Hagemeier, S. Holl, V.
Huber, D. Mallmann, A.
Memon, M. Memon, M. Rambadt, M. Riedel, M. Romberg,
B. Schuller, T. Schlauch, A. Schreiber, T. Soddemann, and
W. Ziegler. UNICORE 6 - Recent and Future
Advancements. Annals of Telecommunication, 65(11):757–
762, 2010.
[18] M. Morgan and S. Grimshaw. Genesis II - Standards Based
Grid Computing. In Proceedings of the Seventh IEEE/ACM
International Symposium on Cluster Computing and the Grid
2007 (CCGRID 2007), Rio de Janeiro, Brazil, pages 611–
618, 2007.
[19] W. Lee, A. McGough, and J. Darlington. Performance
Evaluation of the GridSAM Job Submission and Monitoring
System. pages 915–922, 2005. Proceedings of the UK 2005
All Hands Meeting.
[20] TBDM. Russell, P. Dziubecki, P. Grabowski, M. Krysinski,
T. Kuczynski, D. Szjenfeld, D. Tarnawczyk, G. Wolniewicz,
and J. Nabrzyski. The Vine Toolkit: A Java Framework for
Developing Grid Applications. In Proceedings of the Seventh
International Conference on Parallel Processing and Applied
Mathematics (PPAM 2007), Gdansk, Poland.
[21] S. Jha, H. Kaiser, A. Merzky, and O. Weidner. Grid
Interoperability at the Application Level Using SAGA. In
Proceedings of the IGIIW Workshop, Third IEEE
International Conference on eScience, Bangalore, India,
pages 584–591, 2007.
[22] T. Goodale, S. Jha, H. Kaiser, T. Kielmann, P. Kleijer, A.
Merzky, J. Shalf, and C. Smith. A Simple API for Grid
Applications (SAGA). Open Grid Forum, Grid Final
Document Nr. 90, 2008.
[23] C. Smith, T. Kielmann, S. Newhouse, and M. Humphrey.
The HPC Basic Profile and SAGA: Standardizing, Compute
Grid Access in the Open Grid Forum. Concurrency and
Computation: Practice and Experience, 21(8):1053–1068,
2009.
[24] P. Kacsuk and G. Sipos, Multi-Grid, Multi-User Workflows
in the P-GRADE Grid Portal, Journal of Grid Computing,
Vol. 3, Issue 3-4, pp. 221-238, 2005
[25] JUROPA. http://tinyurl.com/lf2r9e8. [Online; accessed 14
June 2013]
... As the UNICORE workflow management system is independent from any scientific domain, we encourage other disciplines to transform and automate their application scenarios into automated workflows. We have already successfully demonstrated this for applications in remote sensing (Memon et al., 2018a) and for the interpretation of analytical ultracentrifugation experiments (Memon et al., 2013a). ...
Article
Full-text available
Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.
... As the UNICORE workflow management system is independent from any scientific domain, we encourage other disciplines to transform and automate their e-Science steps into automated workflows. We have successfully demonstrated this 10 already for applications in remote sensing (Memon et al., 2018) and for interpretation of analytical ultracentrifugation experiments (Memon et al., 2013a). To access and run the workflow, the UNICORE sites and the workflow services, and the application packages and the coupling scripts have to be installed. ...
Article
Full-text available
Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on distributed and heterogeneous computing resources find it cumbersome to manage and monitor the execution of these tasks. Scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows across distributed and heterogeneous computing environments. As a case study, we apply the UNICORE workflow management system to a formerly hard-coded coupling of a glacier sliding and calving simulation that contains many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, re-usability, portability, and reproducibility in different environments and by different user groups.
... The solution built into US3 is able to simulate self-and hetero-associating reactions, including kinetic rate constants , supports solvent compressibility, co-sedimenting solutes and gradient formation, as well as concentration dependency of s and D. The parallel methods programmed into US3 provide significantly higher accuracy and resolution than conventional approaches, which are limited by traditional desktop or laptop computers where high-resolution analysis is impractical and time consuming. US3 also allows the user to process many datasets in parallel, greatly improving throughput and time savings (Memon et al. 2013). This is particularly critical for the new multi-wavelength data format, where datasets for several hundred wavelengths must be evaluated from each channel. ...
Chapter
The current status of the UltraScan-III (US3) data analysis software suite is described. An overview of the US3 concepts, software layout, the data workflows and US3 components is presented, followed by a discussion of the analysis methods and their applications. Also described are visualization modules for analysis results, US3’s utilities and simulation tools, as well as the collaboration environments for online data and result exchange.
Article
GenApp generates applications on an extensible set of target languages for scientific modules. GenApp utilizes JavaScript object notation (JSON) format for all definition files. To create an application, definition files are created for global directives, menu, and modules. Target languages have definition files detailing the steps-mapping code fragments to output. Modules must be wrapped to accept and produce JSON as defined in the module's definition file. Execution models are not defined by GenApp; they are included in target language code fragments. Previously, GenApp included target languages of HTML5/PHP, Qt3/C++, and Qt4/C++ with execution models of direct local execution, a web server, or a web server accessible resource. A Google Summer of Code (GSoC) 2014 student demonstrated Airavata-managed execution in GenApp's current target languages. Subsequently, Airavata's API and GenApp have evolved. Two GSoC-2015 students updated the previous Airavata integration to support the current API and extend target languages to include Qt5/C++, Qt5/Android, and Java. GenApp was initially developed to wrap modules utilized in the small angles scattering field but is not restricted to this discipline. The GenApp philosophy is to minimize effort of the researcher to deploy modules and insure preservation in an evolving software landscape. Generated applications are in production and used by small angle scattering researchers. Copyright © 2015 John Wiley & Sons, Ltd.
Article
In this paper we discuss the implementation of UNICORE in XSEDE. UNICORE is a Grid middleware tool that was identified by XSEDE to further the areas of remote job submission, campus bridging and workflows. We talk about the overall architecture of UNICORE, a typical HPC environment at XSEDE and why UNICORE is a good fit for this environment. We also discuss the initial efforts made by the UNICORE development team as well as XSEDE's Software development team to integrate UNICORE into the XSEDE landscape. We detail how UNICORE went through the XSEDE engineering process and highlight deployment details at XSEDE. We touch upon how UNICORE is beneficial to the HPC user community. In our final section we talk about future efforts to better integrate UNICORE within XSEDE.
Article
Full-text available
Existing Distributed Resource Managers (DRMs) lack support for a standard submis- sion language and interoperable interface for describing and launching jobs. This paper presents a standards-based job submission system, GridSAM1, which utilises Web Services and the Job Submission Description Language (JSDL). GridSAM provides a transparent and ecient bridge between users and existing DRM systems, such as Condor. We demonstrate, through a set of performance results, the small overhead imposed by the GridSAM submission pipeline while im- proving the overall system throughput and availability. The performance results are gathered from a variety of deployment configurations to exercise the wide-ranging support of GridSAM in terms of launching mechanism, clustering set up and persistence choice.
Article
Full-text available
Computational Grids connect resources and users in a complex way in order to deliver nontrivial qualities of services. According to the current trend various communities build their own Grids and due to the lack of generally accepted standards these Grids are usually not interoperable. As a result, large scale sharing of resources is prevented by the isolation of Grid systems. Similarly, people are isolated, because the collaborative work of Grid users is not supported by current environments. Each user accesses Grids as an individual person without having the possibility of organizing teams that could overcome the difficulties of application development and execution more easily. The paper describes a new workflow-oriented portal concept that solves both problems. It enables the interoperability of various Grids during the execution of workflow applications, and supports users to develop and run their Grid workflows in a collaborative way. The paper also introduces a classification model that can be used to identify workflow-oriented Grid portals based on two general features: Ability to access multiple Grids, and support for collaborative problem solving. Using the approach the different potential portal types are introduced, their unique features are discussed and the portals and Problem Solving Environments (PSE) of our days are classified. The P-GRADE Portal as a Globus-based implementation for the classification model is also presented.
Article
Full-text available
This document specifies the semantics and structure of the Job Submission Description Language (JSDL). JSDL is used to describe the requirements of computational jobs for submission to resources, particularly in Grid environments, though not restricted to the latter. The document includes the normative XML Schema for the JSDL, along with examples of JSDL documents based on this schema.
Conference Paper
Full-text available
In this paper, we introduce Apache Airavata, a software framework to compose, manage, execute, and monitor distributed applications and workflows on computational resources ranging from local resources to computational grids and clouds. Airavata builds on general concepts of service-oriented computing, distributed messaging, and workflow composition and orchestration. This paper discusses the architecture of Airavata and its modules, and illustrates how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems.
Article
Conference Paper
We present a novel divide and conquer method for parallelizing a large scale multivariate linear optimization problem, which is commonly solved using a sequential algorithm with the entire parameter space as the input. The optimization solves a large parameter estimation problem where the result is sparse in the parameters. By partitioning the parameters and the associated computations, our technique overcomes memory constraints when used in the context of a single workstation and achieves high processor utilization when large workstation clusters are used. We implemented this technique in a widely used software package for the analysis of a biophysics problem, which is representative for a large class of problems in the physical sciences. We evaluate the performance of the proposed method on a 512-processor cluster and offer an analytical model for predicting the performance of the algorithm
Conference Paper
The Ultrascan gateway provides a user friendly web interface for evaluation of experimental analytical ultracentrifuge data using the UltraScan modeling software. The analysis tasks are executed on the TeraGrid and campus computational resources. The gateway is highly successful in providing the service to end users and consistently listed among the top five gateway community account usage. This continued growth and challenges of sustainability needed additional support to revisit the job management architecture. In this paper we describe the enhancements to the Ultrascan gateway middleware infrastructure provided through the TeraGrid Advanced User Support program. The advanced support efforts primarily focused on a) expanding the TeraGrid resources incorporate new machines; b) upgrading UltraScan's job management interfaces to use GRAM5 in place of the deprecated WS-GRAM; c) providing realistic usage scenarios to the GRAM5 and INCA resource testing and monitoring teams; d) creating general-purpose, resource-specific, and UltraScan-specific error handling and fault tolerance strategies; and e) providing forward and backward compatibility for the job management system between UltraScan's version 2 (currently in production) and version 3 (expected to be released mid-2011).
Chapter
Sedimentation experiments can provide alarge amount of information about the composition of asample, and the properties of each component contained in the sample. To extract the details of the composition and the component properties, experimental data can be described by amathematical model, which can then be fitted to the data. If the model is nonlinear in the parameters, the parameter adjustments are typically performed by anonlinear least squares optimization algorithm. For models with many parameters, the error surface of this optimization often becomes very complex, the parameter solution tends to become trapped in alocal minimum and the method may fail to converge. We introduce here astochastic optimization approach for sedimentation velocity experiments utilizing genetic algorithms which is immune to such convergence traps and allows high-resolution fitting of nonlinear multi-component sedimentation models to yield distributions for sedimentation and diffusion coefficients, molecular weights, and partial concentrations.