Natalia Kwasnikowska

Universiteit Hasselt, Hasselt, Flanders, Belgium

Are you Natalia Kwasnikowska?

Claim your profile

Publications (14)4.41 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) Allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) Allow developers to build and share tools that operate on such a provenance model. (3) Define provenance in a precise, technology-agnostic manner. (4) Support a digital representation of provenance for any “thing”, whether produced by computer systems or not. (5) Allow multiple levels of description to coexist. (6) Define a core set of rules that identify the valid inferences that can be made on provenance representation. This document contains the specification of the Open Provenance Model (v1.1) resulting from a community effort to achieve inter-operability in the Provenance Challenge series.
    Future Generation Computer Systems 06/2011; · 2.64 Impact Factor
  • Source
    Natalia Kwasnikowska, Luc Moreau, Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) is a community data model for provenance that is designed to facilitate the meaningful interchange of provenance information between systems. Underpinning OPM, is a notion of directed graph, used to represent data products and procsses involved in past computations, and dependencies between them; it is complemented by inference rules allowing new dependencies to be derived. The Open Provenance Model was designed from requirements captured in two `Provenance Challenges', and tested during the third: these challenges were international, multi-disciplinary activities aiming to exchange provenance information between multiple systems and query it. The design of OPM was mostly driven by practical and pragmatic considerations. The purpose of this paper is to formalize the theory underpinning this data model. Specifically, this paper proposes a temporal semantics for OPM graphs, defined in terms of a set of ordering constraints between time-points associated with OPM constructs. OPM inferences are characterized with respect to this temporal semantics, and a novel set of patterns is introduced to establish soundness and completeness properties. Building on this novel foundation, the paper proposes new definitions for graph algebraic operations, graph refinement and the notion of account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Overall, this paper provides a strong theoretical underpinning to a data model being adopted by a community of users that help its disambiguation and promote inter-operability.
    01/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of computer science: annotation databases, probabilistic databases, schema and data integration, etc. In contrast, workflow provenance aims to capture a complete description of evaluation - or enactment - of a workflow, and this is crucial to verification in scientific computation. Workflows and their provenance are often presented using graphical notation, making them easy to visualize but complicating the formal semantics that relates their run-time behavior with their provenance records. We bridge this gap by extending a previously-developed dataflow language which supports both database-style querying and workflow-style batch processing steps to produce a workflow-style provenance graph that can be explicitly queried. We define and describe the model through examples, present queries that extract other forms of provenance, and give an executable definition of the graph semantics of dataflow expressions.
    01/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose DFL—a formal, graphical workflow language for dataflows, i.e., workflows where large amounts of complex data are manipulated, and the structure of the manipulated data is reflected in the structure of the workflow. It is a common extension of (1) Petri nets, which are responsible for the organization of the processing tasks, and (2) nested relational calculus, which is a database query language over complex objects, and is responsible for handling collections of data items (in particular, for iteration) and for the typing system. We demonstrate that dataflows constructed in a hierarchical manner, according to a set of refinement rules we propose, are semi-sound, i.e., initiated with a single token (which may represent a complex scientific data collection) in the input node, terminate with a single token in the output node (which represents the output data collection). In particular they never leave any “debris data” behind and an output is always eventually computed regardless of how the computation proceeds.
    Information Systems 05/2008; · 1.77 Impact Factor
  • Source
    Natalia Kwasnikowska, Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) has recently been proposed as an exchange framework for workflow provenance information. In this paper we show how the NRC data model for workflow repositories can be mapped to the OPM . Our mapping includes such features as complex data flow in an execution of a workflow; different workflows in the repository that call each other; and the tracking of subvalues of complex data structures in the provenance information. Because the NRC dataflow model has been formally specified, also our mapping can be formally specified; in particular, it can be automated. To facilitate this specification, we present an adapted set-theoretic formalization of the basic OPM .
    Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, UT, USA, June 17-18, 2008. Revised Selected Papers; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dataflow repositories are databases containing dataflows and their different runs. We propose a formal conceptual data model for such repositories. Our model includes careful formalisations of such features as complex data manipulation, external service calls, subdataflows, and the provenance of output values.
    Data Integration in the Life Sciences, 4th International Workshop, DILS 2007, Philadelphia, PA, USA, June 27-29, 2007, Proceedings; 01/2007
  • Source
    Natalia Kwasnikowska, Yi Chen, Zoé Lacroix
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an abstract model for scientific protocols, where several atomic operators are proposed for protocol composition. We dis- tinguish two different layers associated with scientific protocols: design and implementation, and discuss the mapping between them. We illus- trate our approach with a representative example and describe Proto- colDB, a scientific protocol repository currently in development. Our approach benefits scientists by allowing the archiving of scientific pro- tocols with the collected data sets to constitute a scientific portfolio for the laboratory to query, compare and revise protocols. Scientific discovery relies on the adequate expression, execution, and analysis of scientific protocols. Although data sets are properly stored, the protocols them- selves are often recorded only on paper or remain in a digital form developed to implement them. Once the scientist who has implemented the scientific protocol leaves the laboratory, the record of the scientific protocol may be lost. Collected data sets without the description of the process that produced them may become meaningless. Moreover, to support scientific discovery, anyone should be able to reproduce the experiment. Therefore, a detailed description of the protocol is necessary, together with the collected data sets. A scientific protocol is the process that describes the experimental compo- nent of scientific reasoning. Scientific reasoning follows a hypothetico-deductive pattern and is composed of the succession of the expression of a causal question, a hypothesis, the predicted results, the design of an experiment, the actual re- sults of the experiment, the comparison of the predicted and the experimental results, and the conclusion, supportive or not of the hypothesis (1). Scientific protocols (also called data-analysis pipelines, workflows or dataflows) are com- plex procedural processes composed of a succession of tasks expressing the way the experiment is conducted. They usually involve a data-gathering stage, that may be followed by an analysis stage. A scientific protocol thus describes how the experiment is conducted and records all necessary information to reproduce the experiment. In bioinformatics, the importance of identifying protocol tasks has been addressed by Stevens et al. (2) and Bartlett et al (3), while Troger (4)
    On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, OTM Confederated International Workshops and Posters, AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET, OnToContent, ORM, PerSys, OTM Academy Doctoral Consortium, RDDS, SWWS, and SeBGIS 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I; 01/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To represent a bioinformatics workflow in NRC the actual atomic processing steps can be defined as abstract external functions, or ”black boxes”, embedded into NRC. These black boxes can be e.g. execution of software, a script or a call to a web-service. The system also allows the use of internal functions to hierarchically organize the construction of workflows. The NRC provides a text based notation for expressing workflows, which is not particularly easy to use. That’s why we propose an equivalent graphical notation based on Petri nets. The same formalism could be used for representing wet-lab experiments with the black boxes encapsulating e.g. the processing of a sample. That could provide the ability to design a workflow incorporating both the data acquisition and data analysis stages of a process. In that case, of course, the wet-lab part of the workflow shouldn’t be automatically optimized. The importance of using similar ways for expressing both in silico and in vitro experiments has also been stressed in the ISXL project [5], although they have defined a new ad-hoc programming language for that purpose. The use of NRC in the field of bioinformatics is not new. It has already been used as core for the BioKleisli system [3], which facilitates the design and execution of a kind of workflows, although their main concern is data integration and not workflow modeling. What are the advantages of that kind of formal model? NRC oers methods of workflow optimization, e.g. those
    01/2005;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose a formal, graphical workflow lan- guage for dataflows, i.e., workflows where large amounts of complex data are manipulated and the structure of the manipulated data is reflected in the structure of the workflow. It is a common extension of - Petri nets, which are responsible for the organization of the process- ing tasks, and - Nested relational calculus, which is a database query language over complex objects, and is responsible for handling collections of data items (in particular, for iteration) and for the typing system. We demonstrate that dataflows constructed in hierarchical manner, ac- cording to a set of refinement rules we propose, are sound : initiated with a single token (which may represent a complex scientific data collection) in the input node, terminate with a single token in the output node (which represents the output data collection). In particular they always process all of the input data, leave no "debris data" behind and the output is always eventually computed.
    On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE, OTM Confederated International Conferences CoopIS, DOA, and ODBASE 2005, Agia Napa, Cyprus, October 31 - November 4, 2005, Proceedings, Part I; 01/2005
  • Natalia Kwasnikowska, Jan Van den Bussche
  • Source
    Luc Moreau, Natalia Kwasnikowska, Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) is a community-driven data model for Provenance that is designed to support inter-operability of provenance technology. Underpinning OPM, is a notion of directed acyclic graph, used to represent data products and processes involved in past computations, and causal dependencies between these. The Open Provenance Model was derived following two ``Provenance Challenges'', international, multi-disciplinary activities trying to investigate how to exchange information between multiple systems supporting provenance and how to query it. The OPM design was mostly driven by practical and pragmatic considerations, and is being tested in a third Provenance Challenge, which has just started. The purpose of this paper is to investigate the theoretical foundations of this data model. The formalisation consists of a set-theoretic definition of the data model, a definition of the inferences by transitive closure that are permitted, a formal description of how the model can be used to express dependencies in past computations, and finally, a description of the kind of time-based inferences that are supported. A novel element that OPM introduces is the concept of an account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Our formalisation gives a precise meaning to such accounts and associated notions of alternate and refinement.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We demonstrate and discuss the advantages of bioinformatical workflows which have no side-effects over those which do have them. In particular, we describe a method to give a formal, mathematical semantics to the side-effect-free workflows defined in SCUFL, the workflo w definition language of Taverna. This is achieved by translating them into a natural extension of the Nested Relational Calculus NRC.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce the Open Provenance Model, a model for provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define the model in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any "thing", whether produced by computer systems or not. (5) To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) To allow provenance information to be ex- changed between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To dene provenance in a precise, technology- agnostic manner. (4) To support a digital representation of provenance for any \thing", whether produced by computer systems or not. (5) To allow multiple levels of description to coexist. (6) To dene,a core set of rules that identify the valid inferences that can be made on provenance representation. This docu- ment contains the specication,of the Open Provenance Model (v1.1) resulting from a community-eort,to achieve inter-operability in the Third Provenance Challenge. Keywords: provenance, representation, inter-operability