Serge Abiteboul

Serge Abiteboul
National Institute for Research in Computer Science and Control | INRIA · DAHU - Verification in Databases Research Team

PhD.

About

409
Publications
38,316
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,993
Citations

Publications

Publications (409)
Article
Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.
Article
The data revolution continues to transform every sector of science, industry, and government. Due to the incredible impact of data-driven technology on society, we are becoming increasingly aware of the imperative to use data and algorithms responsibly—in accordance with laws and ethical norms. In this article, we discuss three recent regulatory fr...
Preprint
The data revolution continues to transform every sector of science, industry and government. Due to the incredible impact of data-driven technology on society, we are becoming increasingly aware of the imperative to use data and algorithms responsibly -- in accordance with laws and ethical norms. In this article we discuss three recent regulatory f...
Conference Paper
Full-text available
We pursue an investigation of data-driven collaborative workflows. In the model, peers can access and update local data, causing side-effects on other peers' data. In this paper, we study means of explaining to a peer her local view of a global run, both at runtime and statically. We consider the notion of "scenario for a given peer" that is a subr...
Conference Paper
Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and tra...
Article
Full-text available
In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (I...
Conference Paper
The typical Internet user has data spread over several devices and across several online systems. We demonstrate an open-source system for integrating user's data from different sources into a single Knowledge Base. Our system integrates data of different kinds into a coherent whole, starting with email messages, calendar, contacts, and location hi...
Conference Paper
We designed a system to infer multimodal itineraries traveled by a user from a combination of smartphone sensor data (e.g., GPS, Wi-Fi, accelerometer) and knowledge of the transport network infrastructure (e.g., road and rail maps, public transportation timetables). The system uses a Transportation network that captures the set of possible paths of...
Article
We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic mo...
Article
Full-text available
The management of Web users' personal information is increasingly distributed across a broad array of applications and systems, including online social networks and cloud-based services. Users wish to share data using these systems, but avoiding the risks of unintended disclosures or unauthorized access by applications has become a major challenge....
Article
Full-text available
Avec la participation de François Bancilhon (Data publica) François Bourdoncle (Dassault systèmes) Stephan Clemencon (Telecom ParisTech) Colin de la Higuera (U. Nantes, SIF) Gilbert Saporta (CNAM) Francoise Soulie-­‐Fogelman (Kxen) François Bourdoncle et Paul Hermelin ont été nommés « chefs de file » de la filière Big Data française. Leur mission e...
Article
Full-text available
We study deduction in the presence of inconsistencies. Following previous works, we capture deduction via datalog programs and inconsistencies through violations of functional dependencies (FDs). We study and compare two semantics for datalog with FDs: the first, of a logical nature, is based on inferring facts one at a time, while never violating...
Article
Full-text available
Le Web contient une masse impressionnante de données, plus ou moins explicites et plus ou moins accessibles aux machines. Nous discutons ici des grandes tendances pour le management de ces données : l’extraction de connaissances du Web, l’enrichissement des connaissances par la communauté des internautes, leur représentation sous forme logique, et...
Article
Full-text available
How to manage knowledge on the Web.
Article
Full-text available
We survey recent work on the specification of an access control mechanism in a collaborative environment. The work is presented in the context of the WebdamLog language, an extension of datalog to a distributed context. We discuss a fine-grained access control mechanism for intentional data based on provenance as well as a control mechanism for del...
Article
Full-text available
We introduce and study a model of collaborative data-driven workflows. In a local-as-view style, each peer has a partial view of a global instance that remains purely virtual. Local updates have side effects on other peers' data, defined via the global instance. We also assume that the peers provide (an abstraction of) their specifications, so that...
Article
Full-text available
We present the WebdamLog system for managing distributed data on the Web in a peer-to-peer manner. We demonstrate the main features of the system through an application called Wepic for sharing pictures between attendees of the sigmod conference. Using Wepic, the attendees will be able to share, download, rate and annotate pictures in a highly dece...
Article
Full-text available
We study the use of WebdamLog, a declarative high-level lan- guage in the style of datalog, to support the distribution of both data and knowledge (i.e., programs) over a network of au- tonomous peers. The main novelty of WebdamLog compared to datalog is its use of delegation, that is, the ability for a peer to communicate a program to another peer...
Conference Paper
The analysis of datalog programs over relational structures has been studied in depth, most notably the problem of containment. The analysis problems that have been considered were shown to be undecidable with the exception of (i) containment of arbitrary programs in nonrecursive ones, (ii) containment of monadic programs, and (iii) emptiness. In t...
Article
The Internet and World Wide Web have revolutionized access to information. Users now store information across multiple platforms from personal computers, to smartphones, to websites such as Youtube and Picasa. As a consequence, data management concepts, methods, and techniques are increasingly focused on distribution concerns. Now that information...
Article
Full-text available
Editing an XML document manually is a complicated task. While many XML editors exist in the market, we argue that some important functionalities are missing in all of them. Our goal is to makes the editing task simpler and faster. We present ALEX (Auto-completion Learning Editor for XML), an editor that assists the users by providing intelligent au...
Article
Full-text available
We address the problem of comparing the expressiveness of workflow specification formalisms using a notion of view of a workflow. Views allow to compare widely different workflow systems by mapping them to a common representation capturing the observables relevant to the comparison. Using this framework, we compare the expressiveness of several wor...
Article
Full-text available
The Webdam ERC grant is a five-year project that started in December 2008. The goal is to develop a formal model for Web data management that would open new horizons for the development of the Web in a well-principled way, enhancing its functionality, performance, and reliability. Specifically, the goal is to develop a universally accepted formal f...
Article
Full-text available
This paper addresses the challenges faced by everyday Web users, who interact with inherently heterogeneous and distributed information. Managing such data is currently beyond the skills of casual users. We describe ongoing work that has as its goal the development of foundations for declarative distributed data management. In this approach, we see...
Article
Full-text available
We study highly expressive query languages for unordered data trees, using as formal vehicles Active XML and extensions of languages in the while family. All languages may be seen as adding some form of control on top of a set of basic pattern queries. The results highlight the impact and interplay of different factors: the expressive power of basi...
Article
Full-text available
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a...
Article
Full-text available
One of the main challenges that the Semantic Web faces is the integration of a growing number of independently designed ontologies. In this work, we present PARIS, an approach for the automatic alignment of ontologies. PARIS aligns not only instances, but also relations and classes. Alignments at the instance level cross-fertilize with alignments a...
Conference Paper
Full-text available
In this paper, we study watermarking methods to prove the ownership of an ontology. Different from existing approaches, we propose to watermark not by altering existing statements, but by removing them. Thereby, our approach does not introduce false statements into the ontology. We show how ownership of ontologies can be established with provably t...
Article
Full-text available
We study the problem of, given a corpus of XML documents and its schema, finding an optimal probabilistic model (optimality meaning maximizing the likelihood of the corpus to be generated). We present an efficient algorithm for finding the best probabilistic model, in absence of constraints. We further study the problem in presence of integrity con...
Article
While classic data management focuses on the data itself, research on Business Processes considers also the context in which this data is generated and manipulated, namely the processes, the users, and the goals that this data serves. This allows ...
Article
Full-text available
There is a new trend to use Datalog-style rule-based languages to specify modern distributed applications, notably on the Web. We introduce here such a language for a distributed data model where peers exchange messages (i.e. logical facts) as well as rules. The model is formally defined and its interest for distributed data management is illustrat...
Article
Full-text available
We present PARIS, an approach for the automatic alignment of ontologies. PARIS aligns not only instances, but also relations and classes. Alignments at the instance-level cross-fertilize with alignments at the schema-level. Thereby, our system provides a truly holistic solution to the problem of ontology alignment. The heart of the approach is prob...
Article
Preference queries incorporate the notion of binary preference relation into relational database querying. Instead of returning all the answers, such queries return only the best answers, according to a given preference ...
Conference Paper
Distributed data management systems consist of peers that store, exchange and process data in order to collaboratively achieve a common goal, such as evaluate some query. We study the equivalence of such systems. We model a distributed system by a collection of Active XML documents, i.e., trees augmented with function calls for performing tasks suc...
Conference Paper
Full-text available
We address the problem of comparing the expressiveness of workflow specification formalisms using a notion of view of a workflow. Views allow to compare widely different workflow systems by mapping them to a common representation capturing the observables relevant to the comparison. Using this framework, we compare the expressiveness of several wor...
Article
Full-text available
We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic mo...
Article
A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-documentT[f1,…,fn] where some leaves are “docking points” for external resources providing XML subtrees (f1,…,fn, standing, e.g., for Web services or peers at remote locations). T...
Preprint
A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote location...
Article
Full-text available
The workflow models have been essentially operation-centric for many years, ignoring almost completely the data aspects. Recently, a new paradigm of data-centric workflows, called business arti- facts, has been introduced by Nigam and Caswell. We follow this approach and propose a model where artifacts are XML documents that evolve in time due to int...
Article
Full-text available
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a...
Conference Paper
Full-text available
The emergence of Web 2.0 and social network applications has enabled more and more users to share sensitive information over the Web. The information they manipulate has many facets: personal data (e.g., pictures, movies, music, contacts, emails), social data (e.g., annotations, recommendations, contacts), localization information (e.g., bookmarks)...
Article
Full-text available
We consider a set of views stating possibly conflicting facts. Negative facts in the views may come, e.g., from functional dependencies in the underlying database schema. We want to predict the truth values of the facts. Beyond simple methods such as voting (typically rather accurate), we explore techniques based on ``corroboration'', i.e., taking...
Article
Full-text available
Active XML is a high-level specification language tailored to data-intensive, distributed, dynamic Web services. Active XML is based on XML documents with embedded function calls. The state of a document evolves depending on the result of internal function calls (local computations) or external ones (interactions with users or other services). Func...
Article
Full-text available
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributionalnodesthatspecifythepossibleworldsand their probabilistic distribution. Particular families of p-doc- uments are determined by the types of distributional nodes that can b...
Article
We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint implication, we also consider an instance-based ap...
Article
Developer communities built around software products, like the SAP Community Network, provide a knowledge base for reocurring problems and their solutions. Due to the large amount of content maintained in such communities, e.g., in forums, finding relevant solutions is a major challenge beyond the scope of common keyword-based search engines. In fa...
Article
Full-text available
Towards a data-centric workflow approach, we introduce an artifact model to capture data and workflow management activities in distributed settings. The model is built on Active XML, i.e., XML trees including Web service calls. We argue that the model captures the essential features of business artifacts as described informally in (1) or discussed...
Conference Paper
Full-text available
Many Web applications are based on dynamic interactions between Web components exchanging flows of information. Such a situa- tion arises for instance in mashup systems (22) or when monitoring distributed autonomous systems (6). This is a challenging prob- lem that has generated recently a lot of attention; see Web 2.0 (38). For capturing interacti...
Article
Full-text available
A distributed XML document is an XML document that spans several machines or Web repositories. We assume that a distribution design of the document tree is given, providing an XML tree some of whose leaves are "docking points", to which XML subtrees can be attached. These subtrees may be provided and controlled by peers at remote locations, or may...
Conference Paper
Full-text available
A mashup is a Web application that integrates data, computation and GUI provided by several systems into a unique tool. The concept originated from the understanding that the number of applications available on the Web and the need for combining them to meet user requirements, are growing very rapidly. This demo presents MatchUp, a system that supp...
Conference Paper
Full-text available
Many Web applications are based on dynamic interactions between Web components exchanging ows of information. Such a situation arises for instance in mashup systems or when monitoring distributed autonomous systems. Our work is in this challenging context that has generated recently a lot of attention; see Web 2.0. We introduce the axlog for- mal m...
Article
Full-text available
evolving data (1). The model captures both the flow of control (workflow) of the application and the evolution of the relevant data (data cycle); see (2) for a brief survey. In the same spirit, we propose a new artifact model building upon Active XML (AXML for short), an extension of XML with embedded service calls (3). The services are hosted by a...
Article
Full-text available
Information ubiquity has created a large crowd of users (most notably scientists), who could employ DBMS technology to share and search their data more effectively. Still, this user base prefers to keep its data in files that can be easily managed by applications such as spreadsheets, rather than deal with the complexity and rigidity of modern data...
Article
Full-text available
Les sources d'incertitude et d'imprécision des données sont nombreuses. Une ma-nière de gérer cette incertitude est d'associer aux données des annotations probabi-listes. De nombreux modèles de bases de données probabilistes ont ainsi été proposés, dans les cadres relationnel et semi-structuré. Ce dernier est particulièrement adapté à la gestion de...