Natalia Kwasnikowska

Hasselt University, Hasselt, Flemish, Belgium

Are you Natalia Kwasnikowska?

Claim your profile

Publications (16)9.74 Total impact

  • Source
    Natalia KWASNIKOWSKA · Jan VAN DEN BUSSCHE
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider an integrated complex-object dataflow database in which multiple dataflow specifications can be stored, together with multiple executions of these dataflows, including the complex-object data that are involved, and annotations. We focus on dataflow applications frequently encountered in the scientific community, involving the manipulation of data with a complex-object structure combined with service calls, which can be either internal or external. Internal services are dataflows acting as a subprogram of an other dataflow, whereas external services are modeled as functions with a possibly non-deterministic behavior. Dataflow specifications are expressed in a high-level programming language based on the nested relational calculus, the operators of which provide the right “glue” needed to combine different service calls into a complex-object dataflow. All entities involved, whether complex-objects, dataflow executions or dataflow specifications, are first-class citizens of the integrated database: they are all data. We discuss how such dataflow repositories can be queried in a variety of ways, including provenance queries. We show that a modern SQL platform with support for (external) routines and SQL/XML suffices to support all types of dataflow repository queries.
    Full-text · Article · Mar 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) Allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) Allow developers to build and share tools that operate on such a provenance model. (3) Define provenance in a precise, technology-agnostic manner. (4) Support a digital representation of provenance for any “thing”, whether produced by computer systems or not. (5) Allow multiple levels of description to coexist. (6) Define a core set of rules that identify the valid inferences that can be made on provenance representation. This document contains the specification of the Open Provenance Model (v1.1) resulting from a community effort to achieve inter-operability in the Provenance Challenge series.
    Full-text · Article · Jun 2011 · Future Generation Computer Systems
  • Source
    Natalia Kwasnikowska · Luc Moreau · Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) is a community data model for provenance that is designed to facilitate the meaningful interchange of provenance information between systems. Underpinning OPM, is a notion of directed graph, used to represent data products and procsses involved in past computations, and dependencies between them; it is complemented by inference rules allowing new dependencies to be derived. The Open Provenance Model was designed from requirements captured in two `Provenance Challenges', and tested during the third: these challenges were international, multi-disciplinary activities aiming to exchange provenance information between multiple systems and query it. The design of OPM was mostly driven by practical and pragmatic considerations. The purpose of this paper is to formalize the theory underpinning this data model. Specifically, this paper proposes a temporal semantics for OPM graphs, defined in terms of a set of ordering constraints between time-points associated with OPM constructs. OPM inferences are characterized with respect to this temporal semantics, and a novel set of patterns is introduced to establish soundness and completeness properties. Building on this novel foundation, the paper proposes new definitions for graph algebraic operations, graph refinement and the notion of account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Overall, this paper provides a strong theoretical underpinning to a data model being adopted by a community of users that help its disambiguation and promote inter-operability.
    Full-text · Article · Jan 2010 · ACM Transactions on the Web
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of computer science: annotation databases, probabilistic databases, schema and data integration, etc. In contrast, workflow provenance aims to capture a complete description of evaluation - or enactment - of a workflow, and this is crucial to verification in scientific computation. Workflows and their provenance are often presented using graphical notation, making them easy to visualize but complicating the formal semantics that relates their run-time behavior with their provenance records. We bridge this gap by extending a previously-developed dataflow language which supports both database-style querying and workflow-style batch processing steps to produce a workflow-style provenance graph that can be explicitly queried. We define and describe the model through examples, present queries that extract other forms of provenance, and give an executable definition of the graph semantics of dataflow expressions.
    Full-text · Article · Jan 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce the Open Provenance Model, a model for provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define the model in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any "thing", whether produced by computer systems or not. (5) To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
    Full-text · Article · Aug 2009
  • Source
    Luc Moreau · Natalia Kwasnikowska · Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) is a community-driven data model for Provenance that is designed to support inter-operability of provenance technology. Underpinning OPM, is a notion of directed acyclic graph, used to represent data products and processes involved in past computations, and causal dependencies between these. The Open Provenance Model was derived following two ``Provenance Challenges'', international, multi-disciplinary activities trying to investigate how to exchange information between multiple systems supporting provenance and how to query it. The OPM design was mostly driven by practical and pragmatic considerations, and is being tested in a third Provenance Challenge, which has just started. The purpose of this paper is to investigate the theoretical foundations of this data model. The formalisation consists of a set-theoretic definition of the data model, a definition of the inferences by transitive closure that are permitted, a formal description of how the model can be used to express dependencies in past computations, and finally, a description of the kind of time-based inferences that are supported. A novel element that OPM introduces is the concept of an account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Our formalisation gives a precise meaning to such accounts and associated notions of alternate and refinement.
    Full-text · Article · Apr 2009
  • Source
    Natalia Kwasnikowska · Jan Van den Bussche
    [Show abstract] [Hide abstract]
    ABSTRACT: The Open Provenance Model (OPM) has recently been proposed as an exchange framework for workflow provenance information. In this paper we show how the NRC data model for workflow repositories can be mapped to the OPM . Our mapping includes such features as complex data flow in an execution of a workflow; different workflows in the repository that call each other; and the tracking of subvalues of complex data structures in the provenance information. Because the NRC dataflow model has been formally specified, also our mapping can be formally specified; in particular, it can be automated. To facilitate this specification, we present an adapted set-theoretic formalization of the basic OPM .
    Full-text · Conference Paper · Jun 2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose DFL—a formal, graphical workflow language for dataflows, i.e., workflows where large amounts of complex data are manipulated, and the structure of the manipulated data is reflected in the structure of the workflow. It is a common extension of (1) Petri nets, which are responsible for the organization of the processing tasks, and (2) nested relational calculus, which is a database query language over complex objects, and is responsible for handling collections of data items (in particular, for iteration) and for the typing system. We demonstrate that dataflows constructed in a hierarchical manner, according to a set of refinement rules we propose, are semi-sound, i.e., initiated with a single token (which may represent a complex scientific data collection) in the input node, terminate with a single token in the output node (which represents the output data collection). In particular they never leave any “debris data” behind and an output is always eventually computed regardless of how the computation proceeds.
    Full-text · Article · May 2008 · Information Systems
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dataflow repositories are databases containing dataflows and their different runs. We propose a formal conceptual data model for such repositories. Our model includes careful formalisations of such features as complex data manipulation, external service calls, subdataflows, and the provenance of output values.
    Full-text · Conference Paper · Jun 2007
  • Source
    Natalia Kwasnikowska · Yi Chen · Zoé Lacroix
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an abstract model for scientific protocols, where several atomic operators are proposed for protocol composition. We distinguish two different layers associated with scientific protocols: design and implementation, and discuss the mapping between them. We illustrate our approach with a representative example and describe ProtocolDB, a scientific protocol repository currently in development. Our approach benefits scientists by allowing the archiving of scientific protocols with the collected data sets to constitute a scientific portfolio for the laboratory to query, compare and revise protocols.
    Preview · Conference Paper · Oct 2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Neurological diseases, including multiple sclerosis (M.S.), often provoke changes in the functioning of the endothelial and epithelial brain barriers and give rise to disease-associated alterations of the cerebrospinal fluid (CSF) proteome. In the present study, pooled and ultrafiltered CSF of M.S. and non-M.S. patients was digested with trypsin and analyzed by off-line strong cation-exchange chromatography (SCX) coupled to on-line reversed-phase LC-ESI-MS/MS. In an alternative approach, the trypsin-treated subproteomes were analyzed directly by LC-ESI-MS/MS and gas-phase fractionation in the mass spectrometer. Taken together, both proteomic approaches in combination with a three-step evaluation process including the search engines Sequest and Mascot, and the validation software Scaffold, resulted in the identification of 148 proteins. Sixty proteins were identified in CSF for the first time by mass spectrometry. For validation purposes, the concentration of cystatin A was determined in individual CSF and serum samples of M.S. and non-M.S. patients using ELISA.
    Full-text · Article · Aug 2006 · Journal of Proteome Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose a formal, graphical workflow language for dataflows, i.e., workflows where large amounts of complex data are manipulated and the structure of the manipulated data is reflected in the structure of the workflow. It is a common extension of Petri nets, which are responsible for the organization of the processing tasks, and Nested relational calculus, which is a database query language over complex objects, and is responsible for handling collections of data items (in particular, for iteration) and for the typing system. We demonstrate that dataflows constructed in hierarchical manner, according to a set of refinement rules we propose, are sound: initiated with a single token (which may represent a complex scientific data collection) in the input node, terminate with a single token in the output node (which represents the output data collection). In particular they always process all of the input data, leave no ”debris data” behind and the output is always eventually computed.
    Full-text · Conference Paper · Oct 2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To represent a bioinformatics workflow in NRC the actual atomic processing steps can be defined as abstract external functions, or ”black boxes”, embedded into NRC. These black boxes can be e.g. execution of software, a script or a call to a web-service. The system also allows the use of internal functions to hierarchically organize the construction of workflows. The NRC provides a text based notation for expressing workflows, which is not particularly easy to use. That’s why we propose an equivalent graphical notation based on Petri nets. The same formalism could be used for representing wet-lab experiments with the black boxes encapsulating e.g. the processing of a sample. That could provide the ability to design a workflow incorporating both the data acquisition and data analysis stages of a process. In that case, of course, the wet-lab part of the workflow shouldn’t be automatically optimized. The importance of using similar ways for expressing both in silico and in vitro experiments has also been stressed in the ISXL project [5], although they have defined a new ad-hoc programming language for that purpose. The use of NRC in the field of bioinformatics is not new. It has already been used as core for the BioKleisli system [3], which facilitates the design and execution of a kind of workflows, although their main concern is data integration and not workflow modeling. What are the advantages of that kind of formal model? NRC oers methods of workflow optimization, e.g. those
    Full-text · Article · Jan 2005
  • [Show abstract] [Hide abstract]
    ABSTRACT: We demonstrate and discuss the advantages of bioinformatical workflows which have no side-effects over those which do have them. In particular, we describe a method to give a formal, mathematical semantics to the side-effect-free workflows defined in SCUFL, the workflo w definition language of Taverna. This is achieved by translating them into a natural extension of the Nested Relational Calculus NRC.
    No preview · Article ·
  • Natalia Kwasnikowska · Jan Van den Bussche

    No preview · Article ·
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) To allow provenance information to be ex- changed between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To dene provenance in a precise, technology- agnostic manner. (4) To support a digital representation of provenance for any \thing", whether produced by computer systems or not. (5) To allow multiple levels of description to coexist. (6) To dene,a core set of rules that identify the valid inferences that can be made on provenance representation. This docu- ment contains the specication,of the Open Provenance Model (v1.1) resulting from a community-eort,to achieve inter-operability in the Third Provenance Challenge. Keywords: provenance, representation, inter-operability
    Full-text · Article ·