About
110
Publications
11,718
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
875
Citations
Publications
Publications (110)
In this paper, we study the following problem: given a knowledge graph (KG) and a set of input vertices (representing concepts or entities) and edge labels, we aim to find the smallest connected subgraphs containing all of the inputs. This problem plays a key role in KG-based search engines and natural language question answering systems, and it is...
In recent years, the Resource Description Framework data model has seen an increasing adoption in Web applications and IT in general. This has contributed to the establishment of standards such as the SPARQL query language and the emergence of production-ready database management systems based on this data model. In this paper, we however argue tha...
Edge computing emerges as an innovative platform for services requiring low latency decision making. Its success partly depends on the existence of efficient data management systems. We consider that knowledge graph management systems have a key role to play in this context due to their data integration and reasoning features. In this paper, we pre...
In this paper, we study the following problem: given a knowledge graph (KG) and a set of input vertices (representing concepts or entities) and edge labels, we aim to find the smallest connected subgraphs containing all of the inputs. This problem plays a key role in KG-based search engines and natural language question answering systems, and it is...
As edge computing is becoming a new platform for rich applications and services, it becomes more and more important to design adapted data management systems for this environment. In this paper, we present a prototype corresponding to a compact, in-memory RDF store that can answer SPARQL queries requiring reasoning services without necessitating an...
Topic evolution networks are widely used to represent the evolution of research topics in scientific document archives. These networks might contain thousands of topics and alignment edges which are computed by comparing millions of topic pairs with some similarity function. In this work, we are addressing the problem of computing a very large numb...
In this paper, we extend LiteMat, an RDFS and owl:sameAs inference-enabled RDF encoding scheme, which is used in a distributed knowledge graph data management system. Our extensions enable to reach RDFS++ expressiveness by integrating owl:transitiveProperty and owl:inverseOf properties. Considering the latter, owl:inverseOf property, we propose a s...
Shifting from Big Data to Big Knowledge requires systems that are able to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. In this work, we are focusing on stream processing and reasoning using the graph-based RDF data model. We are aiming to explore the ability of modern distributed computing fram...
This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL...
The trade-off between language expressiveness and system scalability (E&S) is a well-known problem in RDF stream reasoning. Higher expressiveness supports more complex reasoning logic, however, it may also hinder system scalability. Current research mainly focuses on logical frameworks suitable for stream reasoning as well as the implementation and...
Reasoning over semantically annotated data is an emerging trend in stream processing aiming to produce sound and complete answers to a set of continuous queries. It usually comes at the cost of finding a trade-off between data throughput, latency and the cost of expressive inferences. Strider R proposes such a trade-off and combines a scalable RDF...
Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an on-going, industrial project, a 24 / 7 available stream proces...
Reasoning over semantically annotated data is an emerging trend in stream processing aiming to produce sound and complete answers to a set of continuous queries. It usually comes at the cost of finding a trade-off between data throughput and the cost of expressive inferences. Strider-lsa proposes such a trade-off and combines a scalable RDF stream...
Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. An increasing number of processing jobs executed over such platforms are requiring reasoning mechanisms. The key implementation goal is thus to efficiently handle massive incoming data streams and support reasoning, data analytic services....
A common way to achieve scalability for processing SPARQL queries is to choose MapReduce frameworks like Hadoop or Spark. Processing basic graph pattern (BGP) expressions generating large join plans over distributed data partitions is a major challenge in these frameworks. In this article, we study the use of two distributed join algorithms, partit...
Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an on-going, industrial project, we found out that a 24/7 availab...
After years of research and development, standards and technologies for semantic data are sufficiently mature to be used as the foundation of novel data science projects that employ semantic technologies in various application domains such as bio-informatics, materials science, criminal intelligence, and social science. Typically, such projects are...
To cope with the massive growth of semantic data streams, several RDF Stream Processing (RSP) engines have been implemented. The efficiency of their throughput, latency and memory consumption can be evaluated using available benchmarks such as LSBench and City- Bench. Nevertheless, these benchmarks lack an in-depth performance evaluation as some me...
The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted to various "big data" problems. Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks....
The Semantic Web technologies are being increasingly used for exploiting relations between data. In addition, new tendencies of real-time systems, such as social networks, sensors, cameras or weather information, are continuously generating data. This implies that data and links between them are becoming extremely vast. Such huge quantity of data n...
In this paper, we present a system to visualize RDF knowledge graphs. These
graphs are obtained from a knowledge extraction system designed by
GEOLSemantics. This extraction is performed using natural language processing
and trigger detection. The user can visualize subgraphs by selecting some
ontology features like concepts or individuals. The sys...
The number of linked data sources and the size of the linked open data graph
keep growing every day. As a consequence, semantic RDF services are more and
more confronted with various "big data" problems. Query processing in the
presence of inferences is one them. For instance, to complete the answer set of
SPARQL queries, RDF database systems evalu...
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution...
Electronic Health Records (EHRs) are frequently used by clinicians and researchers to search for, extract, and analyze groups of patients by defining Health Outcome of Interests (HOI). The definition of an HOI is generally considered a complex and time consuming task for health care professionals.
In our clinical note-based pharmacovigilance resear...
We start off this paper with a description of the evolution that led from the first version of the Web to the Web of data, sometimes referred to as the Semantic Web. Based on the “data > information > knowledge” hierarchy, we adopt a usage-based perspective.We then make explicit the structures of knowledge representation and the building blocks of...
In this paper we present WaterFowl, a novel approach for the storage of RDF triples that addresses scalability issues through compression. The architecture of our prototype, largely based on the use of succinct data structures, enables the representation of triples in a self-indexed, compact manner without requiring decompression at query answering...
Open Self-Medication1 is a Web application that better informs people when treating undiagnosed medical ailments with unprescribed, over the counter drugs, i.e., self-medicating. The application achieves this goal by providing a set of functionalities that ensure safety and efficiency of this practice. The system’s most critical operations are proc...
In this paper, we present a novel approach -- called WaterFowl -- for the
storage of RDF triples that addresses some key issues in the contexts of big
data and the Semantic Web. The architecture of our prototype, largely based on
the use of succinct data structures, enables the representation of triples in a
self-indexed, compact manner without req...
RDF Database Systems is a cutting-edge guide that distills everything you need to know to effectively use or design an RDF database. This book starts with the basics of linked open data and covers the most recent research, practice, and technologies to help you leverage semantic technology. With an approach that combines technical detail with theor...
The WaterFowl RDF Store is characterized by its high compression rate and a self-indexing approach. Both of these characteristics are due to its underlying architecture. Intuitively, it is based on a stack composed of two forms of Succinct Data Structures, namely bitmaps and wavelet trees. The ability to efficiently retrieve information from these...
Clinicians and researchers using Electronic Health Records (EHRs) often search for, extract, and analyze groups of patients by defin- ing a health outcome of interest, which may include a set of diseases, conditions, signs, or symptoms. In our work on pharmacovigilance using clinical notes, for example, we use a method that operates over many (po-...
The Semantic Web makes an extensive use of the OWL DL ontology language, underlied by the \(\mathcal{SHOIQ}\) description logic, to formalize its resources. In this paper, we propose a decision procedure for this logic extended with the transitive closure of roles in concept axioms, a feature needed in several application domains. The most challeng...
DRAOn is a distributed reasoner which offers inference services for a network of OWL ontologies correlated by alignments. Reasoning with such networks of ontologies depends on the semantics we define for alignments with respect to ontologies. DRAOn supports two semantics for a network of ontologies: the standard Description Logics (DL) semantics fo...
The World Wide Web infrastructure together with its more than 2 billion users
enables to store information at a rate that has never been achieved before.
This is mainly due to the will of storing almost all end-user interactions
performed on some web applications. In order to reply to scalability and
availability constraints, many web companies inv...
The vision of the Semantic Web is becoming a reality with billions of RDF triples being distributed over multiple queryable end-points (e.g. Linked Data). Although there has been a body of work on RDF triples persistent storage, it seems that, considering reasoning dependent queries, the problem of providing an efficient, in terms of performance, sca...
The Semantic Web makes an extensive use of the OWL DL ontology language, underlied by the SHOIQ description logic, to formalize its resources. In this paper, we propose a decision procedure for this logic extended with the transitive closure of roles in concept axioms, a feature needed in several application domains. To address the problem of consi...
The Anatomical Therapeutic Chemical (ATC) classification system is frequently used to classify drugs according to an encoding sys- Tem that considers the organ or system on which they act on and/or their therapeutic class and chemical characteristics. Motivated by the mainte- nance of a French drug database, we have transformed, using an inductive...
This paper presents a trace-based framework for assisting personalization and enrichment of end-user experience in an application. We propose a modular ontology-based architecture, to provide semantics for interaction traces, observed elements and their associated objects, and we extend existing inference services, with a declarative and generic ap...
Many health care systems and services exploit drug related information stored in databases. The poor data quality of these databases, e.g. inaccuracy of drug contraindications, can lead to catastrophic consequences for the health condition of patients. Hence it is important to ensure their quality in terms of data completeness and soundness.
In the...
No SQL stores are emerging as an efficient alternative to relational database management systems in the context of big data. Many actors in this domain consider that to gain a wider adoption, several extensions have to be integrated. Some of them focus on the ways of proposing more schema, supporting adapted declarative query languages and providin...
In this work, we propose an ExpSpace tableau-based algorithm for deciding consistency of a knowledge base in the description logic SHOIQ. The construction of this algorithm is founded on the standard tableau-based method for SHOIQ and the technique used for designing a NExpTime algorithm for the two-variable fragment of first-order logic with count...
International audience
The Semantic Web extends the principles of the Web by allowing computers to understand and easily explore the Web. In recent years RDF has been a widespread data format for the Semantic Web. There is a real need to efficiently store and retrieve RDF data as the number and scale of Semantic Web in real-word applications in use...
This paper tackles issues encountered in storing and querying services dealing with information described with Semantic Web languages, e.g. OWL and RDF(S). Our work considers RDF triples stored in relational databases. We assume that depending on the applications and queries asked to RDF triple stores, different partitioning approaches can be consi...
Due to the large amount of data generated by user interactions on the Web, some companies are currently innovating in the
domain of data management by designing their own systems. Many of them are referred to as NoSQL databases, standing for ’Not
only SQL’. With their wide adoption will emerge new needs and data integration will certainly be one of...
In this paper, we investigate an extension of the description logic SHIQ\mathcal{SHIQ}–a knowledge representation formalism used for the Semantic Web–with transitive closure of roles occurring not only in concept
inclusion axioms but also in role inclusion axioms. It was proved that adding transitive closure of roles to SHIQ\mathcal{SHIQ} without r...
In this paper, we investigate an extension of the description logic SHIQ\mathcal{SHIQ}–a knowledge representation formalism used for the Semantic Web–with transitive closure of roles occurring not only in concept
inclusion axioms but also in role inclusion axioms. It was proved that adding transitive closure of roles to SHIQ\mathcal{SHIQ} without r...
Welcome to the 3rd International Workshop on Ambient Data Integration (ADI’10). The workshop was held in conjunction with
the On The Move Federated Conference and Workshops (OTM’10), October 25-29, 2010 in Hersonissou, Crete, Greece.
Ambient data integration places an emphasis on integrating data across embedded, context aware, personalized device...
The VENUS European project aims at providing scientific methodologies and technological tools for the virtual exploration of deep water archaeological sites. Due to the peculiarities of underwater archaeological surveys, the knowledge about the studied items is both provided by underwater archaeology and photogrammetry measures. The preservation an...
This dissertation presents my interest in developing methods and algorithms necessary for realizing advanced applications for the Semantic Web. This extension of the current Web aims to allow the integration and sharing of data across organizations and applications. A direct consequence of the success of this approach would enable to consider the W...
In this chapter, we present a solution to the problem of merging structures that represent the conceptual layer of some information
systems. The kind of structures we are studying correspond to expressive ontologies formalized in Description Logics. The
proposed approach creates a merged ontology which captures the knowledge of a set of source onto...
In this work, we consider that the modeling of complex domains can be performed using Domain Specific Languages (DSL). The main principle of this approach consists in developing DSL primitives and to assemble them to model a certain domain. The ability to add new primitives into an existing model and to fine-tune it by replacing some of them provid...
The vision of the Semantic Web is becoming a reality with billions of RDF triples being distributed over multiple queryable end-points (e.g. Linked Data). Although there has been a body of work on RDF triples persistent storage, it seems that the problem of providing an efficient, in terms of query performance and data redundancy, in-ference enable...
The modelling of processes that occur in landscapes is often confronted to issues related to the representation of space and the difficulty of properly handling time and multiple scales. In order to investigate these issues, a flexible modelling environment is required. We propose to develop such a tool based on a Domain Specific Language (DSL) tha...
In this paper, we propose a solution to the problem of merging ontologies when instances associated to two source ontologies
are available. The solution we propose is based on Formal Concept Analysis (FCA) and considers that ontologies are formalized
in expressive Description Logics. Our approach creates a merged ontology which captures the knowled...
In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early
as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In
this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors i...
Ontology-Based Data Access provides a conceptual view over data repositories and mediates the access to this information.
The cornerstone of this approach consists of a set of mappings which express relationships between repository entities and
ontology elements. In practice, these mappings may incorporate constant values. We propose a (semi) autom...
Data integration has gained a great attention in current scientific applications due to the increasingly high volume of heterogeneous
data and proliferation of diverse data generating devices such as sensors. Recently evolved workflow systems contributed a
lot towards scientific data integration by exploiting ontologies. Even though they offer good...
In this paper, we present an ontology mediation solution based on the methods frequently used in Formal Concept Analysis. Our approach of mediation is based on the existence of instances associated to two source ontologies, then we can generate concepts in a new ontology if and only if they share the same extent. Hence our approach creates a merged...
Welcome to the proceedings of the second International Workshop on Ambient Data Integration (ADI 2009). The workshop was held
in conjunction with the On The Move Federated Conference and Workshops (OTM 2009), November 1-6, 2009 in Vilamoura, Portugal.
This paper presents a tool that enables the integration of data stored in relational databases into Semantic Web compliant knowledge bases. The resulting knowledge bases are represented using Description Logics and can thus be translated into the Web Ontology language (OWL). Our approach tackles the impedance mismatch problem which is due to the st...
In this paper, we present an ontology mediation solution based on the methods frequently used in Formal Concept Analysis. Our approach of mediation means that, if we can rely on the existence of instances associated to two source ontologies, then we can generate concepts in a new ontology if and only if they share the same extent. Hence our approac...
In this paper, we propose an ontology-based approac h that enables to detect the emergence of relational confl icts between persons that cooperate on computer supported projec ts. In order to detect these conflicts, we analyze, using this onto logy, the e-mails exchanged between these people. Our method aims to inform project team leaders of s uch...
It is a common characteristic of scientific applications to require the integration of information coming from multiple sources.
This aspect usually confronts end-users with data management issues which involve the transportation of data from one system
to another as well as the syntactic and semantic integration of data, i.e. data come in differen...
Data exchange between multiple sources in scientific applications poses significant data management issues which involve the transportation of data from one system to another as well as the syntactic and semantic integration of data, i.e. data come in different formats and have different meanings. In order to deal with these issues in a systematic...
Due to the large volume and high complexity of data, end-users are often confronted with data management issues such as syntactic and semantic integration of data (data comes in different formats and has different meanings) as well as the pure movement of data in between information systems. In order to cope with these issues in a systematic and we...
In this paper, we present a system which enables the integration of data stored in relational databases into a Semantic Web compliant knowledge base. In this context, our system, named DBOM, proposes a solution to the impedance mismatch problem by proposing a mapping language that allows to specify how to transform data retrieved from tuples of the...
Welcome to the First International Workshop on Ambient Data Integration (ADI 2008). The workshop was held in conjunction with
the On The Move Federated Conferences and Workshops (OTM 2008), November 9-14, 2008 in Monterrey, Mexico.
This workshop is concerned with the subjects that are relevant for the success of data integration systems, such as d...