
Jürgen Umbrich- PhD Student
- Ollscoil na Gaillimhe – University of Galway
Jürgen Umbrich
- PhD Student
- Ollscoil na Gaillimhe – University of Galway
About
64
Publications
12,620
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,988
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (64)
As part of the digital transformation of large companies, such as Distribution System Operators (DSO), the communication between companies and their customers is moving more and more in the direction of Conversational AI. Two big challenges need to be faced in this context, namely: 1) how to efficiently transfer the partially fragmented company kno...
For over 50 years researchers and practitioners have searched for ways to elicit and formalize expert knowledge to support AI applications. Expert systems and knowledge bases were all results of these efforts. The initial efforts on knowledge bases were focused on defining a domain and task intensionally with rather complex ontologies. The increasi...
Intelligent Personal Assistants are changing the way we access the information on the web as search engines changed it years ago. Undoubtfully, an important factor that enables this way of consuming the web is the schema.org annotations on websites. Those annotations are extracted and then consumed by search engines and Intelligent Personal Assista...
No matter how well curated and high quality Knowledge Graphs we build are, they are only as powerful as their applications. In this section we introduce concrete real-world use cases where Knowledge Graphs power dialog-based access to information and services. We do that by giving an overview of the existing chatbot and voice assistant market first...
Since its inception by Google, Knowledge Graph has become a term that is recently ubiquitously used yet does not have a well-established definition. This section attempts to derive a definition for Knowledge Graphs by compiling existing definitions made in the literature and considering the distinctive characteristics of previous efforts for tackli...
This chapter outlines the state of the art of Knowledge Graph technologies by introducing the process of building a Knowledge Graph. We define the following major steps of an overall process model: (1) knowledge creation, (2) knowledge hosting, (3) knowledge curation, and (4) knowledge deployment. We demonstrate the methodology for the knowledge cr...
This book describes methods and tools that empower information providers to build and maintain knowledge graphs, including those for manual, semi-automatic, and automatic construction; implementation; and validation and verification of semantic annotations and their integration into knowledge graphs. It also presents lifecycle-based approaches for...
There is an emerging demand on efficiently archiving and (temporal) querying different versions of evolving semantic Web data. As novel archiving systems are starting to address this challenge, foundations/standards for benchmarking RDF archives are needed to evaluate its storage space efficiency and the performance of different retrieval operation...
TThere is a growing body of literature recognizing the benefits of Open Data. However, many potential data providers are unwilling to publish their data and at the same time, data users are often faced with difficulties when attempting to use Open Data in practice. Despite various barriers in using and publishing Open Data still being present, stud...
The quality of metadata in open data portals plays a crucial role for the success of open data. E-government, for example, have to manage accurate and complete metadata information to guarantee the reliability and foster the reputation of e-government to the public. Measuring and comparing the quality of open data is not a straightforward process b...
This editorial summarizes the content of the Special Issue on Quality Management of Semantic Web Assets (Data, Services and Systems) part of the Semantic Web Journal.
While graph data on the Web and represented in RDF is growing, SPARQL, as the standard query language for RDF still remains largely unusable for the most typical graph query task: nding paths between selected nodes through the graph. Property Paths, as introduced in SPARQL1.1 turn out to be unnt for this task, as they can only be used for testing p...
In this lecture we will discuss and introduce challenges of integrating openly available Web data and how to solve them. Firstly, while we will address this topic from the viewpoint of Semantic Web research, not all data is readily available as RDF or Linked Data, so we will give an introduction to different data formats prevalent on the Web, namel...
Research on preserving evolving linked datasets is gaining increasingly attention in the Semantic Web community. The 2nd workshop on Managing the Evolution and Preservation of the DataWeb (MEPDaW 2016) aimed at addressing the numerous and diverse emerging challenges, from change discovery and scalable archive representations/indexes/infrastructures...
We describe SPARQLES: an online system that monitors the health of public SPARQL endpoints on the Web by probing them with custom-designed queries at regular intervals. We present the architecture of SPARQLES and the variety of analytics that it runs over public SPARQL endpoints, categorised by availability, discoverability, performance and interop...
With the success of Open Data a huge amount of tabular data sources became available that could potentially be mapped and linked into the Web of (Linked) Data. Most existing approaches to “semantically label” such tabular data rely on mappings of textual information to classes, properties, or instances in RDF knowledge bases in order to link – and...
There is an emerging demand on efficiently archiving and (temporal) querying different versions of evolving semantic Web data. As novel archiving systems are starting to address this challenge, foundations/standards for benchmarking RDF archives are needed to evaluate its storage space efficiency and the performance of different retrieval operation...
To integrate various Linked Datasets, data warehousing and live query processing provide two extremes for optimized response time and quality respectively. The first approach provides very fast responses but with low-quality because changes of original data are not immediately reflected on materialized data. The second approach provides accurate re...
Traditional approaches for querying the Web of Data often involve centralised warehouses that replicate remote data. Conversely, Linked Data principles allow for answering queries live over the Web by dereferencing URIs to traverse remote data sources at runtime. A number of authors have looked at answering SPARQL queries in such a manner; these li...
A common way for exposing RDF data on the Web is by means of SPARQL endpoints which allow end users and applications to query just the RDF data they want. However, servers hosting SPARQL endpoints often restrict access to the data by limiting the amount of results returned per query or the amount of queries per time that a client may issue. As this...
To increase performance, data sharing platforms often make use of clusters of nodes where certain tasks can be executed in parallel. Resource planning and especially deciding how many processors should be chosen to exploit parallel processing is complex in such a setup as increasing the number of processors does not always improve runtime due to co...
Hundreds of public SPARQL endpoints have been deployed on the Web, forming a novel decentralised infrastructure for querying billions of structured facts from a variety of sources on a plethora of topics. But is this infrastructure mature enough to support applications? For 427 public SPARQL endpoints registered on the DataHub, we conduct various e...
Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning --- using RDF Schema (RDFS) and the Web Ontology Language (OWL) --- can help to obtain more complete answers for such queries over Linked Data. We first...
The increasing amount of Linked Data and its inherent distributed nature have
attracted significant attention throughout the research community and amongst
practitioners to search data, in the past years. Inspired by research results
from traditional distributed databases, different approaches for managing
federation over SPARQL Endpoints have been...
In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling...
Little is known about the dynamics of Linked Data, primarily because there have been few, if any, suitable collections of data made available for analysis of how Linked Data documents evolve over time. We aim to address this issue. We propose the Dynamic Linked Data Observatory, which provides the community with such a collection, monitoring a fixe...
Inspired by the CAP theorem, we identify three desirable properties when querying the Web of Data: Alignment (results up-to-date with sources), Coverage (results covering available remote sources), and Efficiency (bounded resources). In this short paper, we show that no system querying the Web can meet all three ACE properties, but instead must mak...
For Linked Data query engines, there are inherent trade-offs between centralised approaches that can efficiently answer queries over data cached from parts of the Web, and live decentralised approaches that can provide fresher results over the entire Web at the cost of slower response times. Herein, we propose a hybrid query execution approach that...
Querying over cached indexes of Linked Data often suffers from stale or missing results due to infrequent updates and partial coverage of sources. Conversely, live decentralised approaches offer fresh results directly from the Web, but exhibit slow response times due to accessing numerous remote sources at runtime. We thus propose a hybrid query ap...
Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Link-traversal query approaches for Linked Data have the benefit of up-to-date results and decentralised execution, but operate only on explicit data from dereferenced documents, affecting recall. In this paper, we show how inferable knowledge--specifically...
There has been a recent, tangible growth in RDF published on the Web in accordance with the Linked Data principles and best practices, the result of which has been dubbed the “Web of Data”. Linked Data guidelines are designed to facilitate ad hoc re-use and integration of conformant structured data–across the Web–by consumer applications; however,...
We describe work-in-progress on the design and methodology of the Dynamic Linked Data Observatory: a framework to monitor Linked Data over an extended period of time. The core goal of our work is to collect frequent, continuous snapshots of a subset of the Web of Data that is interesting for further study and experimentation, with an aim to capture...
With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl: sameAs relations to per...
Web search engines such as Google, Yahoo! MSN/Bing, and Ask are far from the consummate Web search solution: they do not typically produce direct answers to queries but instead typically recommend a selection of related documents from the Web. We note that in more recent years, search engines have begun to provide direct answers to prose queries ma...
Traditionally, Linked Data query engines execute SPARQL queries over a materialised repository which on the one hand, guarantees fast query answering but on the other hand requires time and resource consuming preprocessing steps. In addition, the materialised repositories have to deal with the ongoing challenge of maintaining the index which is ---...
In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data – loo...
A growing amount of Linked Data—graph-structured data accessible at sources distributed across the Web—enables advanced data
integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index)
large amounts of data, and evaluate queries against a centralised repository. Given that crawling...
Aside from crawling, indexing, and querying RDF data centrally, Linked Data
principles allow for processing SPARQL queries on-the-fly by dereferencing
URIs. Proposed link-traversal query approaches for Linked Data have the
benefits of up-to-date results and decentralised (i.e., client-side) execution,
but operate on incomplete knowledge available i...
In this paper we introduce RaUL, the RDFa User Interface Language, a user interface markup ontology that is used to describe
the structure of a web form as RDF statements. RaUL separates the markup of the control elements on a web form, the form model, from the data model that the form controls operate on. Form controls and the data model are conne...
Datasets in the LOD cloud are far from being static in their nature and how they are exposed. As resources are added and new links are set, applications consuming the data should be able to deal with these changes. In this paper we investigate how LOD datasets change and what sensible measures there are to accommodate dataset dynamics. We compare o...
We propose a method for consolidating entities in RDF data on the Web. Our approach is based on a statistical analysis of the use of predicates and their associated values to identify "quasi"-key proper-ties. Compared to a purely symbolic based approach, we obtain promising results, retrieving more identical entities with a high precision. We also...
Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data -- where structured data is accessible live and up-to-date at distributed...
The Web of Linked Data is growing and currently consists of several hundred interconnected data sources altogether serving over 25 billion RDF triples to the Web. What has hampered the exploitation of this global dataspace up till now is the lack of an open-source Linked Data crawler which can be employed by Linked Data applications to localize (pa...
Under review: please do not distribute link or text! With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for: (i) entity consolidation—identifying entities which signify the same referent, aka. smushing, entity resolution, object consolidation, etc.— using explicit owl:sameAs relations...
At the time of writing there exists no consensus about the ap-proaches to detect, propagate and describe changes in resources and datasets of the Linked Open Data Web. This survey gives a comprehensive overview of the current technical solutions and a comparison of such based requirements we derived from use cases the community came up with. We giv...
Publishing and consuming content on the Web of Data often requires considerable expertise in the underlying technolo- gies, as the expected services to achieve this are either not packaged in a simple and accessible manner, or are simply lacking. In this poster, we address selected issues by briefly introducing the following essential Web of Data s...
Publishing and consuming content on the Web of Data often requires considerable expertise in the underlying technolo- gies, as the expected services to achieve this are either not packaged in a simple and accessible manner, or are simply lacking. In this poster, we address selected issues by briefly introducing the following essential Web of Data s...
Search engines focusing on particular media types face difficulties in discovering suitable URIs on the Web. Since the engines are only interested in a small fraction of the Web, a crawler should use heuristics to concentrate on that fraction. To devise such a heuristic, we postulate four hypotheses based on RFCs and W3C recommendations to find cue...
Semantics can be integrated in to search processing during both document analysis and querying stages. We describe a system that incorporates both, semantic annotations of Wikipedia articles into the search process and allows for rich annotation search, enabling users to formulate queries based on their knowledge about how entities relate to one an...
Current search engines do not fully leverage semantically rich datasets, or specialise in indexing just one domain-specific dataset.We present a search engine that uses the RDF data model to enable interactive query answering over richly structured and interlinked data collected from many disparate sources on the Web.
We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answer- ing over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing meth- ods for graph-structured data and parallel query evaluation methods on a cl...
We present a system that improves on current document- centric Web search engine technology; adopting an entity-centric per- spective, we are able to integrate data from both static and live sources into a coherent, interlinked information space. Users can then search and navigate the integrated information space through relationships, both existin...
The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and de- scribe and evaluate a five-step pipelined architecture to crawl and ind...
Current web search engines return links to documents for user-specified keywords queries. Users have to then manually trawl through lists of links and glean the required information from documents. In contrast, semantic search engines allow more expressive queries over information integrated from multiple sources, and return specific information ab...
Web documents are increasingly augmented with structured data to allow machines to process and integrate information. We present a complete end-to-end system that enables interactive query answering over large amounts of richly structured and interlinked data collected from over five million sources on the Web. The system utilises an adaptation of...
Current web search engines return links to documents for user-specified keywords queries. Users have to then manually trawl through lists of links and glean the required information from documents. In contrast, semantic search engines allow more expressive queries over information integrated from multiple sources, and return specific information ab...