
Efthymia Tsamoura- PhD
- PostDoc Position at University of Oxford
Efthymia Tsamoura
- PhD
- PostDoc Position at University of Oxford
About
54
Publications
2,895
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
515
Citations
Introduction
Current institution
Publications
Publications (54)
Query answering over data with dependencies plays a central role in most applications of dependencies. The problem is commonly solved by using a suitable variant of the chase algorithm to compute a universal model of the dependencies and the data and thus explicate all knowledge implicit in the dependencies. After this preprocessing step, an arbitr...
Integrating symbolic techniques with statistical ones is a long-standing problem in artificial intelligence. The motivation is that the strengths of either area match the weaknesses of the other, and $\unicode{x2013}$ by combining the two $\unicode{x2013}$ the weaknesses of either method can be limited. Neuro-symbolic AI focuses on this integration...
Probabilistic logical models are a core component of neurosymbolic AI and are important models in their own right for tasks that require high explainability. Unlike neural networks, logical models are often handcrafted using domain expertise, making their development costly and prone to errors. While there are algorithms that learn logical models f...
Multi-Instance Partial Label Learning (MI-PLL) is a weakly-supervised learning setting encompassing partial label learning, latent structural learning, and neurosymbolic learning. Differently from supervised learning, in MI-PLL, the inputs to the classifiers at training-time are tuples of instances $\textbf{x}$, while the supervision signal is gene...
Several techniques have recently aimed to improve the performance of deep learning models for Scene Graph Generation (SGG) by incorporating background knowledge. State-of-the-art techniques can be divided into two families: one where the background knowledge is incorporated into the model in a subsymbolic fashion, and another in which the backgroun...
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search spac...
Parallel neurosymbolic architectures have been applied effectively in NLP by distilling knowledge from a logic theory into a deep model.However, prior art faces several limitations including supporting restricted forms of logic theories and relying on the assumption of independence between the logic and the deep network. We present Concordia, a fra...
The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far st...
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search spac...
Reasoning-based query planning has been explored in many contexts, including relational data integration, the SemanticWeb, and query reformulation. But infrastructure to build reasoning-based optimization in the relational context has been slow to develop. We overview PDQ 2.0, a platform supporting a number of reasoningenhanced querying tasks. We f...
Several techniques have recently aimed to improve the performance of deep learning models for Scene Graph Generation (SGG) by incorporating background knowledge. State-of-the-art techniques can be divided into two families: one where the background knowledge is incorporated into the model in a subsymbolic fashion, and another in which the backgroun...
We study the design of data publishing mechanisms that allow a collection of autonomous distributed data sources to collaborate to support queries. A common mechanism for data publishing is via views : functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individu...
Despite significant progress in the development of neural-symbolic frameworks, the question of how to integrate a neural and a symbolic system in a compositional manner remains open. Our work seeks to fill this gap by treating these two systems as black boxes to be integrated as modules into a single architecture, without making assumptions on thei...
The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs), like Knowledge Graphs (KGs), to tackle important tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of...
The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs) for tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of Trigger Graphs (TGs), which guide the executio...
Despite significant progress in the development of neural-symbolic frameworks, the question of how to integrate a neural and a symbolic system in a \emph{compositional} manner remains open. Our work seeks to fill this gap by treating these two systems as black boxes to be integrated as modules into a single architecture, without making assumptions...
We study the design of data publishing mechanisms that allow a collection of autonomous distributed datasources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual...
We study the design of data publishing mechanisms that allow a collection of autonomous distributed datasources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual...
State-of-the-art inference approaches in probabilistic logic programming typically start by computing the relevant ground program with respect to the queries of interest, and then use this program for probabilistic inference using knowledge compilation and weighted model counting. We propose an alternative approach that uses efficient Datalog techn...
State-of-the-art inference approaches in probabilistic logic programming typically start by computing the relevant ground program with respect to the queries of interest, and then use this program for probabilistic inference using knowledge compilation and weighted model counting. We propose an alternative approach that uses efficient Datalog techn...
Mapping design is overwhelming for end users, who have to check at par the correctness of the mappings and the possible information disclosure over the exported source instance. In this demonstration, we focus on the latter problem by proposing a novel practical solution to ensure that a mapping faithfully complies with a set of privacy restriction...
The problem of data exchange involves a source schema, a target schema and a set of mappings from transforming the data between the two schemas. We study the problem of data exchange in the presence of privacy restrictions on the source. The privacy restrictions are expressed as a set of policy views representing the information that is safe to exp...
Inspired by the magic sets for Datalog, we present a novel goal-driven approach for answering queries over terminating existential rules with equality (aka TGDs and EGDs). Our technique improves the performance of query answering by pruning the consequences that are not relevant for the query. This is challenging in our setting because equalities c...
We consider a setting where a user wants to pose a query against a dataset where background knowledge, expressed as logical sentences, is available, but only a subset of the information can be used to answer the query. We thus want to reformulate the user query against the subvocabulary, arriving at a query equivalent to the user’s query assuming t...
The chase is a family of algorithms used in a number of data management tasks, such as data exchange, answering queries under dependencies, query reformulation with constraints, and data cleaning. It is well established as a theoretical tool for understanding these tasks, and in addition a number of prototype systems have been developed. While indi...
Recent Big Data research typically emphasizes on the need to address the challenges stemming from the volume, velocity, variety and veracity aspects. However, another cross-cutting property of Big Data is volatility. In database technology, volatility is addressed with the help of adaptive query processing (AQP), which has become the dominant parad...
Query reformulation refers to a process of translating a source query—a request for information in some high-level logic-based language—into a target plan that abides by certain interface restrictions. Many practical problems in data management can be seen as instances of the reformulation problem. For example, the problem of translating an SQL que...
We present algorithms for answering queries making use of information about source integrity constraints, access restrictions, and access costs. Our method can exploit the integrity constraints to find plans even when there is no direct access to relations appearing in the query. We look at different kinds of plans, depending on the kind of relatio...
In the previous chapters, we have seen that proofs of an entailment can lead us to some reformulation to our target query Q, respecting the restrictions (e.g., access methods), whenever such a reformulation exists. We now look at finding efficient reformulations. We focus on the setting where the interface is given by access methods, the goal is to...
Let us rephrase the meta-algorithm for reformulating queries that we have referred to throughout the book, adding a bit more generality. It says we should:
1.
Isolate a semantic property that any input query Q must have with respect to the target T and constraints Σ in order to have a reformulation of the desired type.
2.
Express this property as a...
In the previous chapter the target of reformulation was specified through vocabulary restrictions. We wanted a query that used a fixed set of target relations, perhaps restricted to be positive existential or existential. In this chapter we deal with a finer notion of reformulation, where the target has to satisfy access restrictions, as was illust...
In this chapter we look at the problem of query reformulation in the presence of integrity constraints, where a query is defined over a source vocabulary, the goal is to translate it into a query over a target vocabulary, and the constraints relate tables in the source vocabulary to the target vocabulary. This relates to a broad range of problems i...
The main goal of this work is to study a general recipe for translating queries in a source language into a target language, in the presence of integrity constraints:
formulate a semantic property the query needs to have in order to have a translation,
capture the property as a logical entailment,
come up with a proof system that is complete for th...
Traditional query processing involves a search for plans formed by applying algebraic operators on top of primitives representing access to relations in the input query. But many querying scenarios involve two interacting issues that complicate the search. On the one hand, the search space may be limited by access restrictions associated with the i...
The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a sy...
We look at generating plans that answer queries over restricted interfaces, making use of information about source integrity constraints, access restrictions, and access costs. Our method can exploit the integrity constraints to find low-cost access plans even when there is no direct access to relations appearing in the query. The key idea of our m...
The performance of large scale applications, such as those enabled by service-oriented, grid and cloud technologies, heavily relies on aspects related to the network topology and latency. As such, predicting the actual communication latencies is of high interest. The current state-of-the-art solutions to the problem of estimating the latency among...
As cloud-based solutions have become one of the main choices for intensive data analysis both for business decision making and scientific purposes, users face the problem of choosing among different cloud providers. In this work, we deal with data analysis flows that can be split in stages, and each stage can run on multiple cloud infrastructures....
In this survey chapter, we discuss adaptive query processing (AdQP) tech-niques for distributed environments. We also investigate the issues involved in ex-tending AdQP techniques originally proposed for single-node processing so that they become applicable to multi-node environments as well. In order to make it easier for the reader to understand...
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimization significantly, since exec...
The capability to optimize and execute complex queries over multiple remote services (e.g., web services) is of high significance for efficient data management in large scale distributed computing infrastructures, such as those enabled by grid and cloud computing technology. In this work, we investigate the optimization of queries that involve mult...
Ordering of commutative and correlated pipelined stream filters in a dynamic environment is a problem of high interest due to its application in data stream scenarios and its relevance to many query optimization problems. Current state-of-the-art adaptive techniques continuously reoptimize filter orderings utilizing statistics that are collected du...
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimization significantly, since exec...
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimization significantly, since exec...
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due also to emerging computing paradigms such as pro- cessing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimiza- tion significantly, s...
The development of workflow management systems (WfMSs) for the effective and efficient management of workflows in wide-area infrastructures has received a lot of attention in recent years. Existing WfMSs provide tools that simplify the workflow composition and enactment actions, while they support the execution of complex tasks on remote computatio...
Nowadays, technologies such as grid and cloud computing infrastructures and service-oriented architectures have become adequately
mature and have been adopted by a large number of enterprizes and organizations [2,19,36]. A Web Service (WS) is a software
system designed to support interoperable machine-to-machine interaction over a network and is im...
This paper deals with pipelined queries over services. The execution plan of such queries defines an order in which the services are called. We present the theoretical underpinnings of a newly proposed algorithm that produces the optimal linear ordering corresponding to a query being executed in a decentralized manner, i.e., when the services commu...
The problem of reassembling image fragments arises in many scientific fields, such as forensics and archaeology. In the field of archaeology, the pictorial excavation findings are almost always in the form of painting fragments. The manual execution of this task is very difficult, as it requires great amount of time, skill and effort. Thus, the aut...
Abstract—Efficient management of massive data sets is a key aspect in typical grid and e-science applications. To this end, the benefits of employing database technologies in such applications has been identified since the early days of grid computing, which aims at enabling coordinated resource shar- ing, knowledge generation and problem solving i...
Shot segmentation provides the basis for almost all high-level video content analysis approaches, validating it as one of the major prerequisites for efficient video semantic analysis, indexing and retrieval. The successful detection of both gradual and abrupt transitions is necessary to this end. In this paper a new gradual transition detection al...
Shot boundaries provide the basis for almost all high-level video content analysis approaches, validating it as one of the major prerequisites for efficient video indexing and retrieval in large video databases. The successful detection of both gradual and abrupt transitions is necessary to this end. In this paper a new gradual transition detection...