[show abstract][hide abstract] ABSTRACT: Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such “many-task” computations (MTC). Provenance information can record the behavior of such computational experiments via the lineage of process and data artifacts. However, work to date has focused on lineage data models, leaving unsolved issues of recording and query- ing other aspects, such as domain-specific information about the experiments, MTC behavior given by resource consumption and failure information, or the impact of environment on performance and accuracy. In this work we contribute with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance. To help users query provenance data we provide a high level interface that hides relational query complexities. We evaluate MTCProv using an application in protein science, and describe how important query patterns such as correlations between provenance, runtime data, and scientific parameters are simplified and expressed.
Distributed and Parallel Databases 01/2012; 30(5-6):351-370. · 0.81 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this article we describe these capabilities and evaluate the interoperability with other systems through the use of the Open Provenance Model. We describe Swift’s provenance data model and compare it to the Open Provenance Model. We also describe and evaluate activities performed within the Third Provenance Challenge, which consisted of implementing a specific scientific workflow, capturing and recording provenance information of its execution, performing provenance queries, and exchanging provenance information with other systems. Finally, we propose improvements to both the Open Provenance Model and Swift’s provenance system.
[show abstract][hide abstract] ABSTRACT: Scientists increasingly rely on workflow management systems to perform large-scale computational scientific experiments. These
systems often collect provenance information that is useful in the analysis and reproduction of such experiments. On the other
hand, this provenance data may be exposed to security threats which can result, for instance, in compromising the analysis
of these experiments, or in illegitimate claims of attribution. In this work, we describe our ongoing work to trace security
requirements for provenance systems in the context of e-Science, and propose some security controls to fulfill them.
Provenance and Annotation of Data and Processes - Third International Provenance and Annotation Workshop, IPAW 2010, Troy, NY, USA, June 15-16, 2010. Revised Selected Papers; 01/2010
[show abstract][hide abstract] ABSTRACT: Secure provenance techniques are essential in generating trustworthy provenance records, where one is interested in protecting their integrity, confidentiality, and availability. In this work, we suggest an architecture to provide protection of authorship and temporal information in grid-enabled provenance systems. It can be used in the resolution of conflicting intellectual property claims, and in the reliable chronological reconstitution of scientific experiments. We observe that some techniques from public key infrastructures can be readily applied for this purpose. We discuss the issues involved in the implementation of such architecture and describe some experiments realized with the proposed techniques.
Fourth International Conference on e-Science, e-Science 2008, 7-12 December 2008, Indianapolis, IN, USA; 01/2008
[show abstract][hide abstract] ABSTRACT: Monitoring the execution of distributed tasks within the workflow execution is not easy and is frequently controlled manually. This work presents a lightweight middleware monitor to design and control the parallel execution of tasks from a distributed scientific workflow. This middleware can be connected into a workflow management system. This middleware implementation is evaluated with the Kepler workflow management system, by including new modules to control and monitor the distributed execution of the tasks. These middleware modules were added to a bio informatics workflow to monitor parallel BLAST executions. Results show potential to high performance process execution while preserving the original features of the workflow.
8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 19-22 May 2008, Lyon, France; 01/2008
Proceedings of the 5th International Workshop on Middleware for Grid Computing (MGC 2007), held at the ACM/IFIP/USENIX 8th International Middleware Conference, November 26-30, 2007, Newport Beach, Orange County, California, USA; 01/2007