
Jose Emilio Labra Gayo- PhD
- Professor (Associate) at University of Oviedo
Jose Emilio Labra Gayo
- PhD
- Professor (Associate) at University of Oviedo
About
220
Publications
76,926
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,867
Citations
Introduction
Full Professor at University of Oviedo. Founder of WESO research group. My main interests are Semantic Web and Language Technologies.
Current institution
Additional affiliations
January 1992 - April 2020
Publications
Publications (220)
We formally introduce an inheritance mechanism for the Shape Expressions language (ShEx). It is inspired by inheritance in object-oriented programming languages, and provides similar advantages such as reuse, modularity, and more flexible data modelling. Using an example, we explain the main features of the inheritance mechanism. We present its syn...
The integration of artificial intelligence in libraries can have a wide impact on the evolution of information access and management. It allows both the streamlining of internal processes and the transformation of the way users interact with information resources, thereby enhancing effectiveness and operational efficiency while enriching the user e...
RDF shapes have proven to be effective mechanisms for describing and validating RDF content.Typically, shapes are written by domain experts. However, writing and maintaining theseshapes can be challenging when dealing with large and complex schemas. To address this issue,automatic shape extractors have been proposed. These tools are designed to ana...
RDF shapes are formal expressions of schema structures in RDF data. Their primary purposeis twofold: describing and validating RDF data. However, as machine-readable representationsof the expected structures in a given data source, RDF shapes can be applied to varioustasks that require automatic comprehension of data schemas. In this paper, we pres...
Between 1990 and 2023, Chile’s Congress processed and approved 2738 laws, with an average processing time of 667.8 days from proposal to official publication. Recent political circumstances have underscored the need to identify legislative proposals that can be expedited for approval and which ones are unlikely to be approved at all. This article d...
Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hostin...
Knowledge Graphs have been successfully adopted in recent years, existing general-purpose ones, like Wikidata, as well as domain-specific ones, like UniProt. Their increasing size poses new challenges to their practical usage. As an example, Wikidata has been growing the size of its contents and their data since its inception making it difficult to...
Since the return of democracy in 1990 until the end of 2020, Chile’s Congress has processed and approved 2404 laws, with an average processing time of 695 days from proposal to official publication. Recent political circumstances have given urgency to identifying those law propositions that might be shepherded to faster approval and those that will...
LinkML is a data modeling language that can be used to describe the structure and semantics of data from a specific domain. But as with any modeling language, there is a need for tools that support validation of data. The LinkML provides a set of validation tools but there is a growing need to adapt the tools for a broader audience. The work highli...
The paper contains a report of the activities that have been done during the Biohackathon 2023 in Shodoshima, Japan in a project about RDF data integration using Shape Expressions. The paper describes several approaches that have been discussed to create RDF data subsets and some preliminary results applying some of those technologies. It also desc...
Serverless technologies, also known as FaaS (Function as a Service), are promoted as solutions that provide dynamic scalability, speed of development, cost-per-consumption model, and the ability to focus on the code while taking attention away from the infrastructure that is managed by the vendor. A microservices architecture is defined by the inte...
Shape Expressions (ShEx) are used in various fields of knowledge to define RDF graph structures. ShEx visualizations enable all kinds of users to better comprehend the underlying schemas and perceive its properties. Nevertheless, the only antecedent (RDFShape) suffers from limited scalability which impairs comprehension in large cases. In this work...
Several problems arise due to the differences between dentistry and general medicine. The storage of dental data in information silos, the incompatibility of data between different dental clinics or institutions from other medical areas are the most significant ones. The authors propose a decentralized architecture that combines FHIR archetypes, sh...
Knowledge Graphs (KGs) such as Wikidata act as a hub of information from multiple domains and disciplines, and is crowdsourced by multiple stakeholders. The vast amount of available information makes it difficult for researchers to manage the entire KG, which is also continually being edited. It is necessary to develop tools that extract subsets fo...
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task s...
Wikidata is one of the most successful Semantic Web projects. Its underlying Wikibase data model departs from RDF with the inclusion of several features like qualifiers and references, built-in datatypes, etc. Those features are serialized to RDF for content negotiation, RDF dumps and in the SPARQL endpoint. Wikidata adopted the entity schemas name...
The notion of Knowledge Graph stems from scientific advancements in diverse research areas such as Semantic Web, databases, knowledge representation and reasoning, NLP, and machine learning, among others. The integration of ideas and techniques from such disparate disciplines presents a challenge to practitioners and researchers to know how current...
As humans, we can deduce more from the data graph of Figure 2.1 than what the edges explicitly indicate. We may deduce, for example, that the am festival ((eidis)) will be located in Santiago, even though the graph does not contain an edge (eidis)— location →(santiago). We may further deduce that the cities connected by flights must have some airpo...
While deductive knowledge is characterized by precise logical consequences, inductively acquiring knowledge involves generalizing patterns from a given set of input observations, which can then be used to generate novel but potentially imprecise predictions. For example, from a large data graph with geographical and flight information, we may obser...
In this chapter, we discuss some of the most prominent knowledge graphs that have emerged in the past years. We begin by discussing open knowledge graphs, most of which have been published on the Web per the guidelines and protocols described in Chapter 9. We later discuss enterprise knowledge graphs that have been created by companies from diverse...
In this chapter we describe extensions of the data graph–relating to schema, identity, and context–that provide additional structures for accumulating knowledge. Henceforth, we refer to a data graph as a collection of data represented as nodes and edges using one of the models discussed in Chapter 2. We refer to a knowledge graph as a data graph po...
Independent of the (kinds of) source(s) from which a knowledge graph is created, the resulting initial knowledge graph will usually be incomplete, and will often contain duplicate, contradictory or even incorrect statements, especially when taken from multiple sources. After the initial creation and enrichment of a knowledge graph from external sou...
At the foundation of any knowledge graph is the principle of first applying a graph abstraction to data, resulting in an initial data graph. We now discuss a selection of graph-structured data models that are commonly used in practice to represent data graphs. We then discuss the primitives that form the basis of graph query languages used to inter...
In this chapter, we discuss the principal techniques by which knowledge graphs can be created and subsequently enriched from diverse sources of legacy data that range from plain text to structured formats (and anything in between). The appropriate methodology to follow when creating a knowledge graph depends on the actors involved, the domain, the...
Beyond assessing the quality of a knowledge graph, there exist techniques to refine the knowledge graph, in particular to (semi-)automatically complete and correct the knowledge graph [Paul-heim, 2017], aka knowledge graph completion and knowledge graph correction, respectively. As distinguished from the creation and enrichment tasks outlined in Ch...
While it may not always be desirable to publish knowledge graphs (for example, those that offer a competitive advantage to a company [Noy et al., 2019]), it maybe desirable or even required to publish other knowledge graphs, such as those produced by volunteers [Lehmann et al., 2015, Mahdisoltani et al., 2015, Vrandecic and Krotzsch, 2014], by publ...
There is an increasing number of projects based on Knowledge Graphs and SPARQL endpoints. These SPARQL endpoints are later queried by final users or used to feed many different kinds of applications. Shape languages, such as ShEx and SHACL, have emerged to guide the evolution of these graphs and to validate their expected topology. However, authori...
The initial adoption of knowledge graphs by Google and later by big companies has increased their adoption and popularity. In this paper we present a formal model for three different types of knowledge graphs which we call RDF-based graphs, property graphs and wikibase graphs. In order to increase the quality of Knowledge Graphs, several approaches...
Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenge...
Moths form a diverse group of species that are predominantly active at night. They are colourful, have an ecological role, but are less well described compared to their closest relatives, the butterflies. Much remains to be understood about moths, which is shown by the many issues within their taxonomy, including being a paraphyletic group and the...
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as lang...
The amount, size, complexity, and importance of Knowledge Graphs (KGs) have increased during the last decade. Many different communities have chosen to publish their datasets using Linked Data principles, which favors the integration of this information with many other sources published using the same principles and technologies. Such a scenario re...
Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases. Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Sci...
Diet is one of the main sources of exposure to toxic chemicals with carcinogenic potential, some of which are generated during food processing, depending on the type of food (primarily meat, fish, bread and potatoes), cooking methods and temperature. Although demonstrated in animal models at high doses, an unequivocal link between dietary exposure...
Background
Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a publi...
Integration of heterogeneous data sources in a single representation is an active field with many different tools and techniques. In the case of text-based approaches—those that base the definition of the mappings and the integration on a DSL—there is a lack of usability studies. In this work we have conducted a usability experiment (n = 17) on thr...
The use of social network theory and methods of analysis have been applied to different domains in recent years, including public health. The complete procedure for carrying out a social network analysis (SNA) is a time-consuming task that entails a series of steps in which the expert in social network analysis could make mistakes. This research pr...
This paper presents the visualization of national budget, a tool based on Semantic Web technologies that shows by graphic representations the Chilean budget law published annually, and their execution by each state agency. We describe the processes for consuming open data from the Budget National Agency, and how this data is transformed and publish...
Comprehension of past events and its reconstruction is one of the tasks performed by historians. With the introduction of computeraided methods the way in which historians perform their work has been transformed. One of these inclusions is the Semantic Web which can act as an alternative for publication, conciliation, standardisation and integratio...
Validating RDF data becomes necessary in order to ensure data compliance against the conceptualization model it follows, e.g., schema or ontology behind the data, and improve data consistency and completeness. There are different approaches to validate RDF data, for instance, JSON schema, particularly for data in JSONLD format, as well as Shape Exp...
Pandemics, even more than other scientific questions, require swift integration of knowledge and identifiers. In a setting where there is a large number of loosely connected projects and initiatives, we need a common ground, also known as a "commons". Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons, but Wikidata may not...
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languag...
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languag...
We present a method for the construction of SHACL or ShEx constraints for an existing RDF dataset. It has two components that are used conjointly: an algorithm for automatic schema construction, and an interactive workflow for editing the schema. The schema construction algorithm takes as input sets of sample nodes and constructs a shape constraint...
By mistake this chapter was originally published non open access. This has been corrected.
We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case fo...
This paper describes the system architecture for generating the History of the Law developed for the Chilean National Library of Congress (BCN). The production system uses Semantic Web technologies, Akoma-Ntoso, and tools that automate the marking of plain text to XML, enriching and linking documents. These documents semantically annotated allow to...
In this paper, we propose an architecture that combines Big Data and Stream Processing which can be applied to the Real Estate Domain. Our approach consists of a specialisation of Lambda architecture and it is inspired by some aspects of Kappa architecture. As a proof of this solution, we show a prototype developed following it and a comparison of...
The RDF data model forms a cornerstone of the Semantic Web technology stack. Although there have been different proposals for RDF serialization syntaxes, the underlying simple data model enables great flexibility which allows it to be successfully employed in many different scenarios and to form the basis on which other technologies are developed....
The proliferation of large databases with potentially repeated entities across the World Wide Web drives into a generalized interest to find methods to detect duplicated entries. The heterogeneity of the data cause that generalist approaches may produce a poor performance in scenarios with distinguishing features. In this paper, we analyze the part...
Data interoperability is currently a problem that we are facing
more intensely due to the appearance of fields like Big Data or IoT.
Many data is persisted in information silos with neither interconnection
nor format homogenisation. Our proposal to alleviate this problem is
ShExML, a language based on ShEx that can map and merge heterogeneous
data...
In order to perform any operation in an RDF graph, it is recommendable to know the expected topology of the targeted information. Some technologies and syntaxes have been developed in the last years to describe the expected shapes in an RDF graph, such as ShEx and SHACL. In general, a domain expert can use these syntaxes to define shapes in a graph...
RDF validation is a field where the Semantic Web community is currently focusing attention. In other communities, like XML or databases, data validation and quality is considered a key part of their ecosystem. Besides, there is a recent trend to migrate data from different sources to semantic web formats. These transformations and mappings between...
Over the past few years, Public Administrations have been providing systems for procedures and files electronic processing to ensure compliance with regulations and provide public services to citizens. Although each administration provides similar services to their citizens, these systems usually differ from the internal information management poin...
People have been using computers to record and reason about data for many decades. Typically, this reasoning is less esoteric than artificial intelligence tasks like classification.
This chapter includes a short overview of the RDF data model and the Turtle notation, as well as some technologies like SPARQL, RDF Schema, and OWL that form part of the RDF ecosystem.
Shapes Constraint Language (SHACL) has been developed by the W3C RDF Data Shapes Working Group, which was chartered in 2014 with the goal to “produce a language for defining structural constraints on RDF graphs [6].”
Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the language were adopted from Turtle [80] and SPARQL [44] with tokens for groupi...
In this chapter we describe several applications of RDF validation. We start with the WebIndex, a medium-size linked data portal that was one of the earliest applications of ShEx. We describe it using ShEx and SHACL so the reader can see how both formalisms can be applied to describe RDF data.
In this chapter we present a comparison between ShEx and SHACL. The technologies have similar goals and similar features. In fact at the start of the Data Shapes Working Group in 2014, convergence on a unified approach was considered possible. However, this did not happen and as of July 2017 both technologies are maintained as separate solutions.
We present a formal semantics and proof of soundness for shapes schemas, an expressive schema language for RDF graphs that is the foundation of Shape Expressions Language 2.0. It can be used to describe the vocabulary and the structure of an RDF graph, and to constrain the admissible properties and values for nodes in that graph. The language defin...
In this paper, the authors describe Musical Entities Reconciliation Architecture (MERA), an architecture designed to link music-related databases adapting the reconciliation techniques to each particular case. MERA includes mechanisms to manage third party sources to improve the results and it makes use of semantic technologies, storing and organiz...
RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic...
There is a great concern nowadays regarding alcohol consumption and drug abuse, especially in young people. Analyzing the social environment where these adolescents are immersed, as well as a series of measures determining the alcohol abuse risk or personal situation and perception using a number of questionnaires like AUDIT, FAS, KIDSCREEN, and ot...
Linked data portals need to be able to advertise and describe the structure of their content. A sufficiently expressive and intuitive schema language will allow portals to communicate these structures. Validation tools will aid in the publication and maintenance of linked data and increase their quality. Two schema language proposals have recently...
We describe a new educational tool that rely on Semantic Web technologies to enhance lessons content. We conducted an experiment with 32 students whose results demonstrate better performance when exposed to our tool in comparison with a plain native tool. Consequently, this prototype opens new possibilities in lessons content enhancement.
In this work we present a formal description of a learning environment framework that gives support to learning analytics. The framework is based on techniques that educational data mining and social network analysis provide. The purpose is to study or discover collaborative relationships that students generate during their learning process and mak...
In this work we present a formal description of a learning environment framework that gives support to learning analytics. The framework is based on techniques that educational data mining and social network analysis provide. The purpose is to study or discover collaborative relationships that students generate during their learning process and mak...
Recommender systems appear among other reasons with the purpose to improve web information overload and ease information recovery. This kind of systems aid users to find contents in a non-difficult way and with minimal effort. Even though, a great number of these systems performance requires contents to be explicitly rated in order to determine use...
We present Shape Expressions (ShEx), an expressive schema language for RDF
designed to provide a high-level, user friendly syntax with intuitive
semantics. ShEx allows to describe the vocabulary and the structure of an RDF
graph, and to constrain the allowed values for the properties of a node. It
includes an algebraic grouping operator, a choice o...
EDITOR'S SUMMARY
Defined in 1999 and paired with XML, the Resource Description Framework (RDF) has been cast as an RDF Schema, producing data that is well‐structured but not validated, permitting certain illogical relationships. When stakeholders convened in 2014 to consider solutions to the data validation challenge, a W3C working group proposed R...
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alterna...
There is a growing interest in the validation of RDF based solutions where one can express the topology of an RDF graph using some schema language that can check if RDF documents comply with it. Shape Expressions have been proposed as a simple, intuitive language that can be used to describe expected graph patterns and to validate RDF graphs agains...
This chapter introduces the promotion of statistical data to the Linked Open Data initiative in the context of the Web Index project. A framework for the publication of raw statistics and a method to convert them to Linked Data are also presented following the W3C standards RDF, SKOS, and OWL. This case study is focused on the Web Index project; la...
RDF forms the basis of the semantic web technology stack. It is based on a directed graph model where nodes and edges are identified by URIs. Occasionally, such graphs contain literals or blank nodes. The existential nature of blank nodes complicates the graph representation.
In this paper we propose a purely functional representation of RDF graphs...
In this paper we describe the development of the Web Index linked data portal that represents statistical index data and the computations from which it has been obtained.
The Web Index is a multi-dimensional measure of the World Wide Web’s contribution to development and human rights globally. It covers 81 countries and incorporates indicators that...
Public administrations pursue the efficiency and quality of administrative services they offer as well as the reduction of time and operational costs in executing service transactions. However, some issues arise when trying to achieve these goals: (a) the lack of procedure formalization to describe public services, (b) a mechanism to guarantee serv...
This paper deals with the exchange of information between universities problematic, but could influence other sectors like banking, health, etc. where there are not standards defined for information exchange. We propose a semi-automatic learning and assessment system that is capable of unifying the way in which each of the universities work by empl...
RDF is a graph based data model which is widely used for semantic web and linked data applications. In this paper we describe a Shape Expression definition language which enables RDF validation through the declaration of constraints on the RDF model. Shape Expressions can be used to validate RDF data, communicate expected graph patterns for interfa...
The present editorial note introduces the concept of e-Procurement and the use of semantic technologies to improve some of the processes involved in electronic purchasing processes. Currently there is a growing interest to boost the use of electronic communications to deal with Business-to-Business (B2B), Business-to-Consumer (B2C) or Administratio...
The present paper introduces and reviews existing technology and research works in the field of e-Procurement. More specifically this survey aims to collect those relevant approaches that have tackled the challenge of delivering more advanced and intelligent e-Procurement management systems due to its relevance in the industry to afford more timely...
We propose shape expression schema (\hEx), a novel schema formalism for
describing the topology of an RDF graph that uses regular bag expressions
(RBEs) to define constraints on the admissible neighborhood for the nodes of a
given type. We provide two alternative semantics, multi- and single-type,
depending on whether or not a node may have more th...
This paper introduces a multilingual hybrid methodology to automatically deploy and combine collaborative tagging techniques based on user-behavior and recommendation algorithms. A reference web architecture called ACOTA (Automatic Collaborative Tagging) is also described in order to show the recommendation capabilities of this approach with the ai...
RDF is one of the cornerstones of the Semantic Web. It can be considered as a knowledge representation common language based on a graph model. In the functional programming community, inductive graphs have been proposed as a purely functional representation of graphs, which makes reasoning and concurrent programming simpler. In this paper, we propo...
In order to improve the quality of linked data portals, it is necessary to have a tool that can automatically describe and validate the RDF triples exposed. RDF Shape Expressions have been proposed as a language based on Regular Expressions that can describe and validate the structure of RDF graphs. In this paper we describe the WebIndex, a medium...
The present paper introduces a context-aware recommendation system for journalists to enable the identification of similar topics across different sources. More specifically a journalist-based recommendation system that can be automatically configured is presented to exploit news according to expert preferences. News contextual features are also ta...
The present paper introduces a technique to deal with coporate names heterogeneities in the context of public procurement metadata. Public bodies are currently facing a big challenge trying to improve both the performance and the transparency of administrative processes. The e-Government and Open Linked Data initiatives have emerged as efforts to t...
The compilation of key performance indicators (KPIs) in just one value is becoming a challenging task in certain domains to summarize data and information. In this context, policymakers are continuously gathering and analyzing statistical data with the aim of providing objective measures about a specific policy, activity, product or service and mak...