A Lightweight Middleware Monitor for Distributed Scientific Workflows.
ABSTRACT Monitoring the execution of distributed tasks within the workflow execution is not easy and is frequently controlled manually. This work presents a lightweight middleware monitor to design and control the parallel execution of tasks from a distributed scientific workflow. This middleware can be connected into a workflow management system. This middleware implementation is evaluated with the Kepler workflow management system, by including new modules to control and monitor the distributed execution of the tasks. These middleware modules were added to a bio informatics workflow to monitor parallel BLAST executions. Results show potential to high performance process execution while preserving the original features of the workflow.
- SourceAvailable from: Vanessa Braganholo[Show abstract] [Hide abstract]
ABSTRACT: Several scientific areas, such as bioinformatics and oil engineering, need means of executing simulation-based experiments. The state of the practice for this, in most of the cases, consists in the execution of a set of programs. This, however, is not enough to deal with the complexity imposed by the problems that need to be analyzed. This issue gets worse with large-scale experiments. In this case, we need a system to manage the composition of processes and data in a coherent flux. Also, this system must be capable of registering the steps and parameters used in the well-succeeded executions of the experiment. The main motivation of this paper is in identifying and analyzing the challenges that need to be addressed to provide computational support to the development of large-scale scientific experiments. The challenges we identify here deal with the general problem of managing scientific experiments to several applications and resources distributed over a large-scale network such as grids. We identify three complementary research directions: the performance, the management process, and the semantic support. For each of them, we point out some possible solution paths. Resumo. Diversas áreas científicas, tais como bioinformática e engenharia de petróleo, necessitam de meios para a execução de experimentos baseados em simulação. O estado da prática para esse fim consiste, na grande maioria das vezes, na execução de um conjunto de programas, o que não é suficiente para tratar a complexidade imposta pelos problemas a serem analisados. O problema se agrava quando o experimento ocorre em larga escala. Faz-se necessário um sistema que gerencie a composição de processos e dados num fluxo coerente, e que registre as etapas realizadas com escolhas de parâmetros de execuções bem sucedidas do experimento. A motivação principal desse artigo está em identificar e analisar os desafios necessários para prover apoio computacional ao desenvolvimento de experimentos em larga escala. Os desafios que destacamos aqui lidam com o problema geral de gerência de experimentos científicos para várias aplicações e com recursos distribuídos em uma rede de larga escala como grids. Identificamos nesse problema três vertentes de pesquisa complementares na gerência do experimento científico, a saber: o desempenho, o processo de gerência e o apoio semântico, para os quais apontamos algumas direções de possíveis soluções.
- [Show abstract] [Hide abstract]
ABSTRACT: Scientific workflow systems are designed to compose and execute either a series of computational or data manipulation steps, or workflows in a scientific application. They are usually a part of a larger eScience environment. The usage of workflow systems, however very beneficial, is mostly not irrelevant for scientists. There are many requirements for additional functionalities around scientific workflows systems that need to be taken into account, like the ability of sharing workflows, provision of the user-friendly GUI tools for automation of some tasks or submission to distributed computing infrastructures, etc. In this paper we present tools developed in response to the requirements of three different scientific communities. These tools simplify and empower their work with the Kepler scientific workflow system. The usage of such tools and services is presented on Nanotechnology, Astronomy and Fusion scenarios examples.Procedia Computer Science 12/2014; 29. DOI:10.1016/j.procs.2014.05.158
Conference Paper: Online workflow management and performance analysis with Stampede.[Show abstract] [Hide abstract]
ABSTRACT: Scientific workflows are an enabler of complex scientific analyses. They provide both a portable representation and a foundation upon which results can be validated and shared. Large-scale scientific workflows are executed on equally complex parallel and distributed resources, where many things can fail. Application scientists need to track the status of their workflows in real time, detect execution anomalies automatically, and perform troubleshooting — without logging into remote nodes or searching through thousands of log files. As part of the NSF Stampede project, we have developed an infrastructure to answer these needs. The infrastructure captures application-level logs and resource information, normalizes these to standard representations, and stores these logs in a centralized general-purpose schema. Higher-level tools mine the logs in real time to determine current status, predict failures, and detect anomalous performance.7th International Conference on Network and Service Management, CNSM 2011, Paris, France, October 24-28, 2011; 01/2011