A Lightweight Middleware Monitor for Distributed Scientific Workflows.
ABSTRACT Monitoring the execution of distributed tasks within the workflow execution is not easy and is frequently controlled manually. This work presents a lightweight middleware monitor to design and control the parallel execution of tasks from a distributed scientific workflow. This middleware can be connected into a workflow management system. This middleware implementation is evaluated with the Kepler workflow management system, by including new modules to control and monitor the distributed execution of the tasks. These middleware modules were added to a bio informatics workflow to monitor parallel BLAST executions. Results show potential to high performance process execution while preserving the original features of the workflow.
- SourceAvailable from: Vanessa Braganholo
[Show abstract] [Hide abstract]
- "A idéia é prover aos usuários não especialistas em processamento paralelo, uma solução para a execução de seus workflows com melhor desempenho, independente da máquina de execução escolhida. Pesquisas nessa direção estão sendo executadas (Cruz et al., 2008c). "
ABSTRACT: Several scientific areas, such as bioinformatics and oil engineering, need means of executing simulation-based experiments. The state of the practice for this, in most of the cases, consists in the execution of a set of programs. This, however, is not enough to deal with the complexity imposed by the problems that need to be analyzed. This issue gets worse with large-scale experiments. In this case, we need a system to manage the composition of processes and data in a coherent flux. Also, this system must be capable of registering the steps and parameters used in the well-succeeded executions of the experiment. The main motivation of this paper is in identifying and analyzing the challenges that need to be addressed to provide computational support to the development of large-scale scientific experiments. The challenges we identify here deal with the general problem of managing scientific experiments to several applications and resources distributed over a large-scale network such as grids. We identify three complementary research directions: the performance, the management process, and the semantic support. For each of them, we point out some possible solution paths. Resumo. Diversas áreas científicas, tais como bioinformática e engenharia de petróleo, necessitam de meios para a execução de experimentos baseados em simulação. O estado da prática para esse fim consiste, na grande maioria das vezes, na execução de um conjunto de programas, o que não é suficiente para tratar a complexidade imposta pelos problemas a serem analisados. O problema se agrava quando o experimento ocorre em larga escala. Faz-se necessário um sistema que gerencie a composição de processos e dados num fluxo coerente, e que registre as etapas realizadas com escolhas de parâmetros de execuções bem sucedidas do experimento. A motivação principal desse artigo está em identificar e analisar os desafios necessários para prover apoio computacional ao desenvolvimento de experimentos em larga escala. Os desafios que destacamos aqui lidam com o problema geral de gerência de experimentos científicos para várias aplicações e com recursos distribuídos em uma rede de larga escala como grids. Identificamos nesse problema três vertentes de pesquisa complementares na gerência do experimento científico, a saber: o desempenho, o processo de gerência e o apoio semântico, para os quais apontamos algumas direções de possíveis soluções.
- [Show abstract] [Hide abstract]
ABSTRACT: Distributing workflow tasks among high performance environments involves local processing and remote execution on clusters and grids. This distribution often needs interoperation between heterogeneous workflow definition languages and their corresponding execution machines. A centralized Workflow Management System (WfMS) can be locally controlling the execution of a workflow that needs a grid WfMS to execute a sub-workflow that requires high performance. Workflow specification languages often provide different control-flow execution structures. Moving from one environment to another requires mappings between these languages. Due to heterogeneity, control-flow structures, available in one system, may not be supported in another. In these heterogeneous distributed environments, provenance gathering becomes also heterogeneous. This work presents control- flow modules that aim to be independent from WfMS. By inserting these control-flow modules on the workflow specification, the workflow execution control becomes less dependent of heterogeneous workflow execution engines. In addition, they can be used to gather provenance data both from local and remote execution, thus allowing the same provenance registration on both environments independent of the heterogeneous WfMS. The proposed modules extend the ordinary workflow tasks by providing dynamic behavioral execution control. They were implemented in the VisTrails graphical workflow enactment engine, which offers a flexible infrastructure for provenance gathering.Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, UT, USA, June 17-18, 2008. Revised Selected Papers; 01/2008
Conference Paper: Dynamic Workflow Management and Monitoring Using DDS[Show abstract] [Hide abstract]
ABSTRACT: Large scientific computing data-centers require a distributed dependability subsystem that can provide fault isolation and recovery and is capable of learning and predicting failures to improve the reliability of scientific workflows. This paper extends our previous work on the autonomic scientific workflow management systems by presenting a hierarchical dynamic workflow management system that tracks the state of job execution using timed state machines. Workflow monitoring is achieved using a reliable distributed monitoring framework, which employs publish-subscribe middleware built upon OMG Data Distribution Service standard. Failure recovery is achieved by stopping and restarting the failed portions of workflow directed acyclic graph.Engineering of Autonomic and Autonomous Systems (EASe), 2010 Seventh IEEE International Conference and Workshops on; 04/2010