Yannis Tzitzikas

Yannis Tzitzikas
University of Crete | UOC · Department of Computer Science

PhD

About

247
Publications
22,892
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,842
Citations
Introduction
* Web Searching (Interactive Searching, Results Clustering, Instant Search) * Multifaceted exploratory search * Knowledge comparison operators, knowledge evolution and versioning, knowledge visualization * Applications of semantic technologies for Digital Preservation, Provenance Modelling, Information Integration
Additional affiliations
February 2006 - present
University of Crete
Position
  • Professor (Assistant)
February 2006 - present
Foundation for Research and Technology - Hellas
Position
  • Affiliated Researcher
July 2003 - April 2004
VTT Technical Research Centre of Finland
Position
  • PostDoc Position

Publications

Publications (247)
Article
The CIDOC Conceptual Reference Model (CIDOC-CRM) is an ISO Standard ontology for the cultural domain that is used for enabling semantic interoperability between museums, libraries, archives and other cultural institutions. For leveraging CIDOC-CRM, several processes and tasks have to be carried out. It is therefore important to investigate to what...
Article
Full-text available
Modern information systems have to support the user in managing, understanding and interacting with, more and more data. Visualization could help users comprehend information more easily and reach conclusions in relative shorter time. However, the bigger the data is, the harder the problem of visualizing it becomes. In this paper we focus on the pr...
Article
Text usually suffers from typos which can negatively affect various Information Retrieval and Natural Language Processing tasks. Although there is a wide variety of choices for tackling this issue in the English language, this is not the case for other languages. For the Greek language, most of the existing phonetic algorithms provide rather insuff...
Article
Full-text available
All public bodies in Greece, including Universities, are obliged to comply with the national legal framework and policy on open data. An emerging concern is how such a big and diverse organization could develop supporting procedures from an administrative, legal and technical stand point, that will enhance and expand the level of the provided open...
Article
Full-text available
There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve more information for these entities, which can be of primary importance for several tasks, e.g.,...
Article
Full-text available
Browsing has been the core access method for the Web from its beginning. Analogously, one good practice for publishing data on the Web is to support dereferenceable URIs, to also enable plain web browsing by users. The information about one URI is usually presented through HTML tables (such as DBpedia and Wikidata pages) and graph representations (...
Chapter
A vast area of research in historical science concerns the documentation and study of artefacts and related evidence. Current practice mostly uses spreadsheets or simple relational databases to organise the information as rows with multiple columns of related attributes. This form offers itself for data analysis and scholarly interpretation, howeve...
Chapter
Question Answering (QA) in vague or complex open domain information needs is hard to be adequate, satisfying and pleasing for end users. In this paper we investigate an approach where QA complements a general purpose interactive keyword search system over RDF. We describe the role of QA in that context, and we detail and evaluate a pipeline for QA...
Preprint
Full-text available
A vast area of research in historical science concerns the documentation and study of artefacts and related evidence. Current practice mostly uses spreadsheets or simple relational databases to organise the information as rows with multiple columns of related attributes. This form offers itself for data analysis and scholarly interpretation, howeve...
Article
Descriptive and empirical sciences, such as History, are the sciences that collect, observe and describe phenomena to explain them and draw interpretative conclusions about influences, driving forces and impacts under given circumstances. Spreadsheet software and relational database management systems are still the dominant tools for quantitative a...
Preprint
Full-text available
Descriptive and empirical sciences, such as History, are the sciences that collect, observe and describe phenomena in order to explain them and draw interpretative conclusions about influences, driving forces and impacts under given circumstances. Spreadsheet software and relational database management systems are still the dominant tools for quant...
Chapter
Every university in Greece is obliged to comply with the national legal framework on open data. The rising question is how such a big and diverse organization could support open data from an administrative, legal and technical point of view, in a way that enables gradual improvement of the open data-related services. In this paper, we describe our...
Chapter
Semantic Warehouses integrate data from various sources for offering a unified view of the data and enabling the answering of queries which cannot be answered by the individual sources. However, such semantic warehouses have to be refreshed periodically as the underlying datasets change. This is a challenging requirement, not only because the mappi...
Article
Full-text available
A vast area of research in historical science concerns the analysis of historical archival sources. This involves activities such as digitizing the historical sources, usually using spreadsheets or simple relational databases, and then analyzing the transcribed data using a range of methods depending on the kind of data and the type of research que...
Chapter
With the explosion of social networks, the web has been transformed into an arena of inappropriate interactions and content, such as fake news and misinformation, deception, hate speech, inauthentic online behaviour, proselytism, slander, and mobbing. In this demo we present Chattack, a first step towards our aim of providing publicly available dat...
Article
Full-text available
The continuous accumulation of multi-dimensional data and the development of Semantic Web and Linked Data published in the Resource Description Framework (RDF) bring new requirements for data analytics tools. Such tools should take into account the special features of RDF graphs, exploit the semantics of RDF and support flexible aggregate queries....
Preprint
There is a proliferation of approaches that exploit RDF datasets for creating URI embeddings, i.e., embeddings that are produced by taking as input URI sequences (instead of simple words or phrases), since they can be of primary importance for several tasks (e.g., machine learning tasks). However, existing techniques exploit either a single or a fe...
Conference Paper
Full-text available
Every university in Greece is obliged to comply with the national legal framework on open data. The rising question is how such a big and diverse organization could support open data from an administrative , legal and technical point of view, in a way that enables gradual improvement of the open data-related services. In this paper, we describe our...
Article
Faceted Search is a widely used interaction scheme in digital libraries, e-commerce, and recently also in Linked Data. Surprisingly, object ranking in the context of Faceted Search is not well studied in the literature. In this article, we propose an extension of the model with two parameters that enable specifying the desired answer size and the g...
Chapter
We shall demonstrate \(\mathtt{LODsyndesis}_{IE}\), which is a research prototype that offers Entity Extraction from text and Entity Enrichment for the extracted entities, using several Linked Datasets. \(\mathtt{LODsyndesis}_{IE}\) exploits widely used Named Entity Extraction and Disambiguation tools (i.e., DBpedia Spotlight, WAT and Stanford Core...
Chapter
The task of accessing knowledge graphs through structured query languages like SPARQL is rather demanding for ordinary users. Consequently, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm, either by translating keyword queries to structured queries, or by adopting classical information...
Article
Full-text available
Question Answering (QA) systems aim at supplying precise answers to questions, posed by users in a natural language form. They are used in a wide range of application areas, from bio-medicine to tourism. Their underlying knowledge source can be structured data (e.g. RDF graphs and SQL databases), unstructured data in the form of plain text (e.g. te...
Article
Full-text available
Since the task of accessing RDF datasets through structured query languages like SPARQL is rather demanding for ordinary users, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm. However this task is challenging since there is no clear unit of retrieval and presentation, the user informat...
Conference Paper
The task of accessing knowledge graphs through structured query languages like SPARQL is rather demanding for ordinary users. Consequently, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm, either by translating keyword queries to structured queries, or by adopting classical information...
Chapter
For ordinary users, the task of accessing knowledge graphs through structured query languages like SPARQL is rather demanding. As a result, various approaches exploit the simpler and widely used keyword-based search paradigm, either by translating keyword queries to structured queries, or by adopting classical information retrieval (IR) techniques....
Conference Paper
For ordinary users, the task of accessing knowledge graphs through structured query languages like SPARQL is rather demanding. As a result, various approaches exploit the simpler and widely used keyword-based search paradigm, either by translating keyword queries to structured queries, or by adopting classical information retrieval (IR) techniques....
Article
RDF Knowledge Graphs (or Datasets) contain valuable information that can be exploited for a variety of real-world tasks. However, due to the enormous size of the available RDF datasets, it is difficult to discover the most valuable datasets for a given task. For improving dataset Discoverability, Interlinking and Reusability, there is a trend for D...
Chapter
In this paper we introduce an approach, called LODQA, for open domain Question Answering over Linked Open Data. We confine ourselves to three kinds of questions: factoid, confirmation, and definition questions. By using LODQA it is feasible to answer questions over 400 millions of entities of any domain without using any training data, since we exp...
Chapter
The continuous accumulation of multi-dimensional data and the development of Semantic Web and Linked Data published in RDF bring new requirements for data analytics tools. Such tools should take into account the special features of RDF graphs, exploit the semantics of RDF and support flexible aggregate queries. In this paper, we present an approach...
Chapter
There is an increasing trend of using Linked Datasets for creating embeddings from URI sequences, since such embeddings can be exploited for several tasks, i.e., for machine learning problems, tasks related to content-based similarity, and others. Existing techniques exploit either a single or a few datasets (or RDF graphs) for creating URI sequenc...
Chapter
Faceted Search is a widely used interaction scheme in digital libraries, e-commerce, and recently also in Linked Data. Nevertheless, object ranking in the context of Faceted Search is not well studied. In this paper we propose an extended version of the model enriched with parameters that enable specifying the characteristics of the sought object r...
Article
Link traversal has emerged as a SPARQL query processing method that exploits the Linked Data principles to dynamically discover data relevant for answering a query by dereferencing online Web resources (URIs) at query execution time. While several approaches for such a lookup-based query evaluation method have been proposed, there exists no analysi...
Article
A large number of published datasets (or sources) that follow Linked Data principles is currently available and this number grows rapidly. However, the major target of Linked Data, i.e., linking and integration, is not easy to achieve. In general, information integration is difficult, because (a) datasets are produced, kept, or managed by different...
Chapter
There is a lack of high-quality corpora for the purposes of training task-oriented, end-to-end dialogue systems. This paper describes a dialogue collection process which used crowd-sourcing and a Wizard-of-Oz set-up to collect written human-human dialogues for a task-oriented, multi-domain scenario. The context is a tourism agency, where users try...
Preprint
Question Answering (QA) is a challenging topic since it requires tackling the various difficulties of natural language understanding. Since evaluation is important not only for identifying the strong and weak points of the various techniques for QA, but also for facilitating the inception of new methods and techniques, in this paper we present a co...
Conference Paper
The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation...
Preprint
Large amounts of RDF/S data are produced and published lately, and several modern applications require the provision of versioning and archiving services over such datasets. In this paper we propose a novel storage index for archiving versions of such datasets, called CPOI (compact partial order index), that exploits the fact that an RDF Knowledge...
Chapter
The collation of information for the monitoring of fish stocks and fisheries is a difficult and time-consuming task, as the information is scattered across different databases and is modelled using different formats and semantics. Our purpose is to offer a unified view of the existing stocks and fisheries information harvested from three different...
Preprint
There is a plethora of datasets in various formats which are usually stored in files, hosted in catalogs, or accessed through SPARQL endpoints. In most cases, these datasets cannot be straightforwardly explored by end users, for satisfying recall-oriented information needs. To fill this gap, in this paper we present the design and implementation of...
Preprint
The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation...
Book
This book aims to explain the main problems related to what is called digital preservation through examples in the context of a fairy tale. Digital preservation is the endeavour of preserving digital material against loss, corruption, hardware/software technology changes, and changes in the knowledge of the community. Digital preservation is import...
Book
This book explains the main problems related to digital preservation using examples based on a modern version of the well-known Cinderella fairy tale. Digital preservation is the endeavor to protect digital material against loss, corruption, hardware/software technology changes, and changes in the knowledge of the community. Τhe structure of the b...
Article
Full-text available
In this paper, we present LODsyndesis, a suite of services over the datasets of the entire Linked Open Data Cloud, which offers fast, content-based dataset discovery and object co-reference. Emphasis is given on supporting scalable cross-dataset reasoning for finding all information about any entity and its provenance. Other tasks that can be benef...
Preprint
The federated query extension of SPARQL 1.1 allows executing queries distributed over different SPARQL endpoints. SPARQL-LD is a recent extension of SPARQL 1.1 which enables to directly query any HTTP web source containing RDF data, like web pages embedded with RDFa, JSON-LD or Microformats, without requiring the declaration of named graphs. This m...
Article
Full-text available
The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connec...
Conference Paper
Full-text available
The federated query extension of SPARQL 1.1 allows executing queries distributed over different SPARQL endpoints. SPARQL-LD is a recent extension of SPARQL 1.1 which enables to directly query any HTTP web source containing RDF data, like web pages embedded with RDFa, JSON-LD or Microformats, without requiring the declaration of named graphs. This m...
Preprint
Full-text available
The LOD (Linked Open Data) cloud currently contains thousands of published datasets. Existing visualizations, like the Linking Open Data cloud diagram, are useful for getting an overview of its size, the datasets and their connectivity. An interesting question is whether we could come up with more informative and more interactive visualizations tha...
Chapter
In many applications one has to fetch and assemble pieces of information coming from more than one source for building a semantic warehouse offering more advanced query capabilities. In this paper the authors describe the corresponding requirements and challenges, and they focus on the aspects of quality and value of the warehouse. For this reason...
Article
Although the ultimate objective of Linked Data is linking and integration, it is not currently evident how con- nected the current LOD (Linked Open Data) cloud is. In this paper, we focus on methods, supported by special indexes and algorithms, for performing measurements related to the connectivity of more than two datasets which are useful in var...
Chapter
In many applications, one has to fetch and assemble pieces of information coming from more than one source for building a semantic warehouse offering more advanced query capabilities. This chapter describes the corresponding requirements and challenges, and focuses on the aspects of quality, value and evolution of the warehouse. It details various...
Chapter
This chapter describes the pattern “Provenance and context of digital photographs”. The episode describes the activities of Robert for getting the provenance of a particular digital photo. The technical sections describe metadata for image files, formats for storing and exchanging these metadata (i.e. Exif metadata), and the modeling of provenance...
Chapter
In this chapter we explore the pattern “Preservation Planning”. The episode describes the adventures of Robert when he decided to preserve his entire digital heritage. The technical sections discuss compression-related risks, preservation planning, data management planning, data replication, and blog preservation.
Chapter
This chapter describes the pattern “External behavioral dependencies”. In this episode we consider Robert’s attempt to compile and run source code written in Java. The technical sections discuss the dependencies that are required for executing software components, software documentation, as well as IP addresses and DNS, since software components mi...
Chapter
This chapter describes the pattern “Storage Media: Durability and Access.” The episode describes the adventures of Robert for accessing the contents of the USB stick. The technical sections discuss storage media durability and usage, as well as, cloud storage and particular cases that demonstrate how complex bit preservation can be.
Chapter
This chapter describes the pattern “Reproducibility of scientific results.” The episode describes the adventures of Robert to verify the claims of a research paper. The technical sections describe HTML resources, web archiving, web citation, scientific publishing, trust in digital repositories, the data–information–knowledge–wisdom hierarchy, lab n...
Chapter
This chapter describes the pattern “Software Decompiling”. The episode describes the efforts of Robert to decompile a particular piece of software. The technical sections discuss compilers, interpreters and decompilers, and provide examples of programming language code (Java) and software build automation tools (Maven).
Chapter
This chapter describes the pattern “Text and symbol encoding”. The episode describes the activities of Robert for rendering properly a particular HTML file. The technical sections discuss the encoding of characters, the markup language HTML, the semantics of characters, as well, the task of parsing digital files.
Chapter
This chapter attempts to generalize and describe a more universal “Meta-Pattern”. It can be considered as an agile approach that is based on the notion of task performability and is powered by knowledge-based reasoning services. This is the most compact, technically-wise, chapter of the book, suitable for all those who would embrace a deeper and mo...
Chapter
This chapter describes the pattern “Understand and run software written in an obsolete programming language”. The episode describes and activities of Robert for executing a software application written in an obsolete programming language. The technical sections provide information about two legendary computers (Amstrad 464 and Commodore 64), and di...
Chapter
This chapter describes the pattern “Web Application Execution”. The episode describes the activities of Robert to deploy and run a web application. The technical sections discuss web application archive files (WAR), cloud computing, as well as visual web programming languages (MIT Scratch).
Chapter
This chapter introduces the reader to the basic concepts of Digital Preservation and justifies the significance of the topic. Subsequently, it decomposes the problem, and introduces the notion of Digital Preservation Pattern that is used in the next chapters of the book. Finally, it describes the structure of the book and the relationships amongst...
Chapter
This chapter describes the pattern “Interpretation of Data Values.” The episode follows the activities of Robert for interpreting the contents of a particular data file. The technical sections discuss technologies and data formats that aim to be as self-describing as possible for assisting the interpretation of data, including NetCDF, Semantic Web...
Chapter
This chapter describes briefly the history of the popular fairy tale of Cinderella, its variations over time, and provides an overview of the version by Charles Perrault on 1967.
Chapter
This chapter describes the pattern “Proprietary Format Recognition”. The episode describes the adventures of Robert for understanding a file with an unknown format. The technical sections discuss approaches for recognizing file formats, digital preservation-friendly formats, and objects serialization.
Chapter
This chapter introduces a modern version of the Cinderella fairy tale: It narrates the story of Daphne, a young undergraduate student of Computer Science. The story is continued in the episodes of the subsequent chapters.
Chapter
This chapter concludes the fairy tale of the book. The dream of Daphne actually sketches a more “digital preservation-friendly” information society, on the basis of the following five pillars: (a) production and storage of information, (b) information and formats, (c) service providers, (d) software and (e) trust.
Chapter
This chapter describes the pattern “Executables: Safety and Dependencies”. This episode observes Robert’s attempt to check whether it is safe to execute a particular executable file. The technical sections discuss the fundamental concepts of program termination, decidability and tractability, as well as software engineering concepts including code...
Chapter
In this chapter we study the pattern “Authenticity Assessment”. The episode is based on the efforts of Robert to understand and assess the authenticity of a particular txt file. The technical sections discuss related technologies, i.e., Checksums, Digital Signatures, Web authentication, cryptography, processes for assessing authenticity, and finall...
Chapter
This chapter describes those activities of Robert that enabled him to eventually find Daphne, the owner of the USB stick. This chapter does not include any technical sections.
Chapter
This chapter describes the pattern “Metadata for digital files and file systems”. The episode describes the first exposure of Robert to the contents of the USB stick. The technical sections discuss file systems, file signatures and filename extensions, metadata in general, and processes for extracting (and possibly transforming and/or enriching) th...
Conference Paper
In this paper we show how an exploratory search process, specifically the Preference-enriched Faceted Search (PFS) process, can be enriched for exploring datasets that also contain geographic information. In the introduced extension, that we call PFSgeo, the objects can have geographical coordinates, the interaction model is extended, and the web-i...
Article
Full-text available
In this work we discuss the related challenges and describe an approach towards the fusion of state-of-the-art technologies from the Spoken Dialogue Systems (SDS) and the Semantic Web and Information Retrieval domains. We envision a dialogue system named LD-SDS that will support advanced, expressive, and engaging user requests, over multiple, compl...
Conference Paper
The discovery of useful data for a given problem is of primary importance since data scientists usually spend a lot of time for discovering, collecting and preparing data before using them for various reasons, e.g., for applying or testing machine learning algorithms. In this paper we propose a general method for discovering, creating and selecting...
Article
Health-related information is nowadays accessible from many sources and is one of the most searched-for topics on the Internet. However, existing search systems often fail to provide users with a good list of medical search results, especially for classic (keyword-based) queries. In this article we elaborate on whether and how we can exploit biomed...
Article
Full-text available
The amounts of available Semantic Web (SW) data (including Linked Open Data) constantly increases. Users would like to browse and explore effectively such information spaces without having to be acquainted with the various vocabularies and query language syntaxes. This paper discusses the work that has been done in the area for the case of RDF/S da...
Article
Full-text available
In the linked open data cloud, the biggest open data graph that currently exists, a remarkable percentage of data are unnamed resources, also called blank nodes. Several fundamental tasks, such as graph isomorphism checking and RDF data versioning, require computing a map between the sets of blank nodes of two graphs. This map aims at minimizing th...
Article
Most Voting Advice Applications (VAAs) are questionnaire-based systems. In this paper we introduce, analyze and evaluate an alternative approach; we show how Preference-enriched Faceted Search (PFS) can be used as a VAA. The introduced approach is more expressive, since it allows users to prioritize their preferences, it is more transparent since u...