Conference Paper

Enhanced Resource Management Enabling Standard Parameter Sweep Jobs for Scientific Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Parameter sweeps are used by researchers with scientific domain-specific tools or workflows to submit a large collection of computational jobs whereby each single job of it only varies in certain parts. They require a more fine-grained distribution of jobs across resources, which also raise a significant challenge for efficient resource management in middleware environments that have been not specifically designed to perform parameter sweeps. This paper offers insights into parameter sweep solutions that support multi-disciplinary science environments via abstraction from resource management complexities using middleware. The solutions are based on use case requirements, enable efficient submission, enhanced usability, and standard compliance. We also apply a use case taken from the life science domain to demonstrate usefulness and efficiency of the solutions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the front end, users only view whole set of iterations as one job, which to our experience is quite intuitive and reduces overall data analysis time. Further details regarding the UNICORE parameter sweep implementation can be found at [13]. The use of this important technology addressed the challenge (iv) mentioned above. ...
Conference Paper
Full-text available
In this paper we give a brief overview of the three projects that were chosen for XSEDE-PRACE collaboration in 2014. We begin this paper with an introduction of the XSEDE and PRACE organizations and the motivation for a collaborative effort between these two organizations. We then talk about the three projects that were involved in this collaboration. We provide an overview of the projects themselves and what was in scope for this collaboration. We also outline the hurdles and issues faced during this unique collaborative effort and also discuss the benefits the projects derived from this collaboration. We finally outline the future steps envisioned for XSEDE-PRACE collaborative efforts going forward.
Article
Full-text available
This document specifies the semantics and structure of the Job Submission Description Language (JSDL). JSDL is used to describe the requirements of computational jobs for submission to resources, particularly in Grid environments, though not restricted to the latter. The document includes the normative XML Schema for the JSDL, along with examples of JSDL documents based on this schema.
Article
Full-text available
The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.
Article
Full-text available
In the paper we present how g-Eclipse can be used for easy running computation on Grid resources. The g-Eclipse project is an EU-founded project that aims to build an integrated workbench framework to access the power of existing Grid infrastructures. The g-Eclipse framework provides a general, integrated workbench toolset for Grid users, operators and developers. It is very useful for inexperienced users to interact with Grid resources independently of the underlying Grid middleware. The Grid abstraction enables Grid users to access the Grid in a desktop-like manner with wizards specified for common use cases.
Article
Full-text available
In the last three years activities in Grid computing have changed; in particular in Europe the focus moved from pure research-oriented work on concepts, architectures, interfaces, and protocols towards activities driven by the usage of Grid technologies in day-to-day operation of e-infrastructure and in applicationdriven use cases. This change is also reected in the UNICORE activities [1]. The basic components and services have been established, and now the focus is increasingly on enhancement with higher level services, integration of upcoming standards, deployment in e-infrastructures, setup of interoperability use cases and integration of applications. The development of UNICORE started back more than 10 years ago, when in 1996 users, supercomputer centres and vendors were discussing "what prevents the efficient use of distributed supercomputers?". The result of this discussion was a consensus which still guides UNICORE today: seamless, secure and intuitive access to distributed resources. Since the end of 2002 continuous development of UNICORE took place in several EU-funded projects, with the subsequent broadening of the UNICORE community to participants from across Europe. In 2004 the UNICORE software became open source and since then UNICORE is developed within the open source developer community. Publishing UNICORE as open source under BSD license has promoted a major uptake in the community with contributions from multiple organisations. Today the developer community includes developers from Germany, Poland, Italy, UK, Russia and other countries. The structure of the paper is as follows. In Section 2 the architecture of UNICORE 6 as well as implemented standards are described, while Section 3 focusses on its clients. Section 4 covers recent developments and advancements of UNICORE 6, while in section 5 an outlook on future planned developments is given. The paper closes with a conclusion.
Article
Full-text available
The gLite Workload Management System (WMS) is a collection of components that provide the service responsible for distributing and managing tasks across computing and storage resources available on a Grid. The WMS basically receives requests of job execution from a client, finds the required appropriate resources, then dispatches and follows the jobs until completion, handling failure whenever possible. Other than single batch-like jobs, compound job types handled by the WMS are Directed Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs), Parametric Jobs (multiple jobs with one parametrized description), and Collections (multiple jobs with a common description). Jobs are described via a flexible, high-level Job Definition Language (JDL). New functionality was recently added to the system (use of Service Discovery for obtaining new service endpoints to be contacted, automatic sandbox files archival/compression and sharing, support for bulk-submission and bulk-matchmaking). Intensive testing and troubleshooting allowed to dramatically increase both job submission rate and service stability. Future developments of the gLite WMS will be focused on reducing external software dependency, improving portability, robustness and usability.
Article
Full-text available
The paper presents an overview of the current research and achievements of the DEISA project, with a focus on the general concept of the infrastructure, the operational model, application projects and science communities, the DEISA Extreme Computing Initiative, user and application support, operations and technology, services, collaborations and interoperability, and the use of standards and policies. The paper concludes with a discussion about the long-term sustainability of the DEISA infrastructure.
Article
Full-text available
The Trans-Proteomic Pipeline (TPP) is a suite of software tools for the analysis of MS/MS data sets. The tools encompass most of the steps in a proteomic data analysis workflow in a single, integrated software system. Specifically, the TPP supports all steps from spectrometer output file conversion to protein-level statistical validation, including quantification by stable isotope ratios. We describe here the full workflow of the TPP and the tools therein, along with an example on a sample data set, demonstrating that the setup and use of the tools are straightforward and well supported and do not require specialized informatic resources or knowledge.
Article
Certain scientific use cases possess complex requirements to have Grid jobs executed in collections where the jobs' request contains only some variation in different parts. These scenarios can easily be tackled by a single job request which abstract this variation and can represent the same collection. The Open Grid Forum (OGF) standards community modeled this requirement through the Job Submission and Description Language (JSDL) Parameter Sweep specification, which takes a modular approach to handle different variations of parameter sweeps (e.g. document and file sweep). In this paper we present the UNICORE server environment implementing this specification build upon its existing JSDL implementation. We also demonstrate the application of UNICORE's parameter sweep extension for optimizing job executions, which are submitted as a sub-activity of a Taverna based scientific workflow. Further we validate our approach by analyzing performance of the workflow with and without using the parameter sweep extension.
Article
Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.
Article
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
Conference Paper
The provision and processing of information in an e- Science environment are essential tasks. For this purpose, most environments provide information services which aggregate data from different information sources and make it available to users and other services. In this paper we present CIS, an extensible information service with an underlying unified information model. Designed according to service-oriented architectural principles, CIS consumes data from sources like Ganglia, formats it according to the Common Information Model, and delivers it against XQuery requests. We realised the information service in a Web Services environment and integrated it into an implied volatility application within the NextGRID project and the UNICORE middleware.
Conference Paper
Today, many scientific disciplines heavily rely on computer systems for in-silico experimentation or data management and analysis. The employed computer hard- and software is heterogeneous and complies to different standards, interfaces and protocols for interoperation. Grid middleware systems like UNICORE 6 try to hide some of the complexity of the underlying systems by offering high-level, uniform interfaces for executing computational jobs or storing, moving, and searching through data. Via UNICORE 6 computer resources can be accessed securely with different software clients, e.g. the UNICORE Command line Client (UCC) or the graphical UNICORE Rich Client (URC) which is based on Eclipse. In this paper, we describe the design and features of the URC, and highlight its role as a flexible and extensible Grid client framework using the QSAR field as an example.
Conference Paper
As scientific workflows are becoming more complex and apply compute-intensive methods to increasingly large data volumes, access to HPC resources is becoming mandatory. We describe the development of a novel plug in for the Tavern a workflow system, which provides transparent and secure access to HPC/grid resources via the UNICORE grid middleware, while maintaining the ease of use that has been the main reason for the success of scientific workflow systems. A use case from the bioinformatics domain demonstrates the potential of the UNICORE plug in for Tavern a by creating a scientific workflow that executes the central parts in parallel on a cluster resource.
Conference Paper
The increasing availability of Web services asked for investigating ways to automate the discovery process. Discovery processes enhanced with semantics can be recognize to be general, but often they lack the flexibility needed in specific domains. In this paper, we propose the flexible architecture of the discovery engine Glue2, which comes with a powerful set of discovery components (for functional matching, non-functional matching, data fetching, etc.) that can be executed in different order as required by specific execution workflows.
Article
This document specifies the syntax and semantics of the Parameter Sweep extension to the Job Submission Description Language (JSDL) 1.0 [JSDL]. The syntax and semantics defined in this document provide an alternative to explicitly submitting thousands of individual JSDL job submissions of the same base job template, except with a different set of parameters for each.
Article
Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.
UNICORE 6—recent and future advancements" annals of telecommunications-annales des télécommunications 65
  • Streit
  • Achim
Streit, Achim, et al. "UNICORE 6—recent and future advancements" annals of telecommunications-annales des télécommunications 65.11-12 (2010): 757-762
Glue2: a web service discovery engine with non-functional properties" on Web Services
  • Carenini
  • Alessio
Carenini, Alessio, et al. "Glue2: a web service discovery engine with non-functional properties" on Web Services, ECOWS'08, IEEE Sixth European Conference, 2008
HPC Basic profile, Version 1
  • B Dillaway
  • M Humphrey
  • M Theimer
  • G Wasson
JSDL Single Program Multiple Data Application Extensions, OGF draft
  • A Savva
Enhancing the performance of workflow execution in e-science environments by using the standards based parameter sweep model
  • S Memmon
  • S Holl
  • B Schuller
  • M Riedel
  • A Grimshaw