
Vasilis Efthymiou- PhD
- Assistant Professor at Harokopio University of Athens
Vasilis Efthymiou
- PhD
- Assistant Professor at Harokopio University of Athens
Assistant Professor at Harokopio University of Athens
About
70
Publications
11,936
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,220
Citations
Introduction
entity resolution, semantic annotation of tables, ontology-based query answering
Current institution
Additional affiliations
Education
March 2013 - November 2017
Publications
Publications (70)
We develop a flexible, open-source framework for query answering on relational databases by adopting methods and techniques from the Semantic Web community and the data exchange community, and we apply this framework to a medical use case. We first deploy module-extraction techniques to derive a concise and relevant sub-ontology from an external re...
In recent years, several knowledge bases have been built to enable large-scale knowledge sharing, but also an entity-centric Web search, mixing both structured data and text querying. These knowledge bases offer machine-readable descriptions of real-world entities, e.g., persons, places, published on the Web as Linked Data. However, due to the diff...
Natural Language Interfaces to Databases (NLIDB) systems eliminate the requirement for an end user to use complex query languages like SQL, by translating the input natural language (NL) queries to SQL automatically. Although a significant volume of research has focused on this space, most state-of-the-art systems can at best handle simple select-p...
Web tables constitute valuable sources of information for various applications, ranging from Web search to Knowledge Base (KB) augmentation. An underlying common requirement is to annotate the rows of Web tables with semantically rich descriptions of entities published in Web KBs. In this paper, we evaluate three unsupervised annotation methods: (a...
The same real-world entity (e.g., a movie, a restaurant, a person) may be described in various ways on different datasets. Entity Resolution (ER) aims to find such different descriptions of the same entity, this way improving data quality and, therefore, data value. However, an ER pipeline typically involves several steps (e.g., blocking, similarit...
Entity Resolution (ER) plays a crucial role, facilitating the integration of knowledge bases and identifying similarities among entities from different sources. In this work, we address the following challenges: streaming data, incremental processing, and fairness. There is a lack of studies involving fairness and ER, which is related to the absenc...
The rise of financial crime that has been observed in recent years has created an increasing concern around the topic and many people, organizations and governments are more and more frequently trying to combat it. Despite the increase of interest in this area, there is a lack of specialized datasets that can be used to train and evaluate works tha...
The proliferation of Knowledge Graphs (KGs) that support a wide variety of applications, like entity search, question answering and recommender systems, has led to the need for identifying overlapping information among different KGs. Entity Alignment (EA) is the problem of detecting such overlapping information among KGs that refer to the same real...
Entity Resolution has been an active research topic for the last three decades, with numerous algorithms proposed in the literature. However, putting them into practice is often a complex task that requires implementing, combining and configuring complementary individual algorithms into comprehensive end-to-end workflows. To facilitate this process...
Named Entity Recognition (NER) and Linking (NEL) have seen great advances lately, especially with the development of language models pre-trained on large document corpora, typically written in the most popular languages (e.g., English). This makes NER and NEL tools for other languages, with fewer resources available, fall behind the latest advances...
In recent years, we have witnessed the proliferation of knowledge graphs (KG) in various domains, aiming to support applications like question answering, recommendations, etc. A frequent task when integrating knowledge from different KGs is to find which subgraphs refer to the same real-world entity, a task largely known as the Entity Alignment. Re...
Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particular domain and play a central role in a multitude of AI tasks like recommendations and query answering. Recent works have revealed that KG embedding methods used to implement these tasks often exhibit direct forms of bias (e.g., related to gender, nation...
Entity resolution (ER) is the task of finding records that refer to the same real-world entities. A common scenario, which we refer to as Clean-Clean ER, is to resolve records across two clean sources (i.e., they are duplicate-free and contain one record per entity). Matching algorithms for Clean-Clean ER yield bipartite graphs, which are further p...
Recent advances in NLU and NLP have resulted in renewed interest in natural language interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems evolved from keyword search and focused on simple factual queries, the complexity of both the input sentences as well as the generated SQL...
Today, we are gathering more and more new astrophysics knowledge, but we are using obsolete ways of representing and processing it. This position paper discusses potential ways of constructing an astronomical KG, semantically annotating and reasoning on such data, using neuro-symbolic methods.
Enterprises create domain-specific knowledge bases (KBs) by curating and integrating their business data from multiple sources. To support a variety of query types over domain-specific KBs, we propose Hermes, an ontology-based system that allows storing KB data in multiple backends, and querying them with different query languages. In this paper, w...
Recent advances in natural language understanding and processing have resulted in renewed interest in natural language interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems evolved from keyword search and focused on simple factual queries, the complexity of both the input sente...
In recent years, we have witnessed the proliferation of knowledge graphs (KG) in various domains, aiming to support applications like question answering, recommendations, etc. A frequent task when integrating knowledge from different KGs is to find which subgraphs refer to the same real-world entity. Recently, embedding methods have been used for e...
SemTab 2021 was the third edition of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, successfully collocated with the 20th International Semantic Web Conference (ISWC) and the 16th Ontology Matching (OM) Workshop. SemTab provides a common framework to conduct a systematic evaluation of state-of-the-art systems.
Natural language interfaces provide an easy way to query and interact with data and enable non-technical users to investigate data sets without the need to know a query language. Recent advances in natural language understanding and processing have resulted in a renewed interest in natural language interfaces to data. The main challenges in natural...
Entity Resolution (ER) is the task of finding records that refer to the same real-world entities. A common scenario is when entities across two clean sources need to be resolved, which we refer to as Clean-Clean ER. In this paper, we perform an extensive empirical evaluation of 8 bipartite graph matching algorithms that take in as input a bipartite...
Readers, as well as journalists, are overwhelmed with the information available in online news articles, making it very difficult to verify and validate their content. An important tool to support readers in this task is that of named entity linking (NEL), i.e., semantically annotating entities mentioned in text with entities described in knowledge...
There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the...
Infusing autonomous artificial systems with knowledge about the physical world they inhabit is of utmost importance and a long-lasting goal in Artificial Intelligence (AI) research. Training systems with relevant data is a common approach; yet, it is not always feasible to find the data needed, especially since a big portion of this knowledge is co...
Medical knowledge bases (KBs), distilled from biomedical literature and regulatory actions, are expected to provide high-quality information to facilitate clinical decision making. Entity disambiguation (also referred to as entity linking) is considered as an essential task in unlocking the wealth of such medical KBs. However, existing medical enti...
We develop a flexible, open-source framework for query answering on relational databases by adopting methods and techniques from the Semantic Web community and the data exchange community, and we apply this framework to a medical use case. We first deploy module-extraction techniques to derive a concise and relevant sub-ontology from an external re...
Conversational interfaces to Business Intelligence (BI) applications enable data analysis using a natural language dialog in small incremental steps. To truly unleash the power of conversational BI to democratize access to data, a system needs to provide effective and continuous support for data analysis. In this paper, we propose BI-REC, a convers...
Medical knowledge bases (KBs), distilled from biomedical literature and regulatory actions, are expected to provide high-quality information to facilitate clinical decision making. Entity disambiguation (also referred to as entity linking) is considered as an essential task in unlocking the wealth of such medical KBs. However, existing medical enti...
One of the most critical tasks for improving data quality and increasing the reliability of data analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to the same real-world entity. Despite several decades of research, ER remains a challenging problem. In this survey, we highlight the novel aspects of resolvi...
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DB-pedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the dat...
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data...
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-...
Enterprises are creating domain-specific knowledge graphs by curating and integrating their business data from multiple sources. The data in these knowledge graphs can be described using ontologies, which provide a semantic abstraction to define the content in terms of the entities and the relationships of the domain. The rich semantic relationship...
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly he...
One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in...
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly he...
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of descriptions published in the Web of Data. To address them, we propose the MinoanER framework that fulfills full automation and support of highly heterogeneous entitie...
Entity resolution (ER) is the problem of identifying descriptions of the same real-world entities among or within knowledge bases (KBs). In this PhD thesis, we study the problem of ER in the Web of data, in which entities are described using graph-structured RDF data, following the principles of the Linked Data paradigm. The two core ER problems ar...
Recommender systems have received significant attention, with most of the proposed methods focusing on recommendations for single users. However, there are contexts in which the items to be suggested are not intended for a user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a...
Entity resolution constitutes a crucial task for
many applications, but has an inherently quadratic complexity.
In order to enable entity resolution to scale to large volumes of data, blocking
is typically employed: it clusters similar entities into (overlapping) blocks so
that it suffices to perform comparisons only within each block.
To further i...
Today, privacy is a key concept. It is also one which is rapidly evolving with technological advances, and there is no consensus on a single definition for it. In fact, the concept of privacy has been defined in many different ways, ranging from the “right to be left alone” to being a “commodity” that can be bought and sold. In the same time, power...
Web tables have been proven to constitute valuable sources of information for applications, ranging from Web search, to data discovery in spreadsheet software and KB augmentation. A requirement for those applications is to understand the semantics of Web tables and potentially match their contents with existing URIs in the Web of Data, a process kn...
Recommender systems have received significant attention, with most of the proposed methods focusing on recommendations for single users. However, there are contexts in which the items to be suggested are not intended for a user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a...
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-...
Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overl...
In the Web of data, entities are described by inter-linked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-proc...
Top-k is a well-studied problem in the literature, due to its wide spectrum of applications, like information retrieval, database querying, Web search and data mining. In the big data era, the volume of the data and their velocity, call for efficient parallel solutions that overcome the restricted resources of a single machine. Our motivating appli...
This tutorial provides an overview of the key research results in the area of entity resolution that are relevant to addressing the new challenges in entity resolution posed by the Web of data, in which real world entities are described by interlinked data rather than documents. Since such descriptions are usually partial, overlapping and sometimes...
In this paper we present our work in a real-time, context-aware system, applied in a smart classroom domain, which aims to assist its users after recognizing any occurring activity. We exploit the advantages of ontologies in order to model the context and introduce as well a method for extracting information from an
ontology and using it in a mach...
Multi-Context Systems is a rule-based representation model for distributed, heterogeneous knowledge agents, which cooperate by sharing parts of their local knowledge through a set of bridge rules also known as mappings. The concept of conviviality was recently proposed for modeling and measuring cooperation among agents in multiagent systems. In th...
Multi-Context Systems (MCS) are rule-based representation models for distributed, heterogeneous knowledge sources, called contexts, such as ambient intelligence devices and agents. Contexts interact with each other through the sharing of their local knowledge, or parts thereof, using so-called
bridge rules to enable the cooperation among different...
The Internet of Things allows people and objects to seamlessly interact, crossing the bridge between real and virtual worlds. Newly created spaces are heterogeneous; social relations naturally extend to smart objects. Conviviality has recently been introduced as a social science concept for ambient intelligent systems to highlight soft qualitative...
With the pervasive development of sociotechnical systems, such as Facebook, Twitter and digital cities, modelling and reasoning on social settings has acquired great significance. Hence, an independent soft objective of system design is to facilitate interactions. Conviviality has been introduced as a social science concept for multiagent systems t...
In this paper we present our work in a real-time, context-aware system, applied in a smart classroom domain, which aims to assist its users after recognizing any occurring activity. We exploit the advantages of ontologies in order to model the context and introduce as well a method for extracting information from an ontology and using it in a machi...