Oscar Corcho

Oscar Corcho
Universidad Politécnica de Madrid | UPM · Departamento de Inteligencia Artificial

PhD

Publications

Publications (439)
Book
Full-text available
The ACTION toolkit is the ultimate resource collection for everyone interested in doing citizen science the ACTION way. The toolkit draws on expertise in citizen science, participatory design, social innovation, socio-economic studies, pollution, open science, social computing, open data and software development in the ACTION team, to ensure it sui...
Article
Full-text available
Governments need to be accountable and transparent for their public spending decisions in order to prevent losses through fraud and corruption as well as to build healthy and sustainable economies. Open data act as a major instrument in this respect by enabling public administrations, service providers, data journalists, transparency activists, and...
Article
Many data are published on the Web using tabular data formats (e.g., spreadsheets). One of the main challenges for their effective (re)use is their generalised lack of semantics (e.g., column names are not usually standardised, and their meaning and content are not always clear). There is a common understanding that the reuse of tabular data may be...
Article
The Spatial Data Infrastructure initiatives are now broadly developed and deployed. However, while plenty of tools use them, some tasks are still complex to perform by non‐expert users, such as finding, accessing, and using some of their related OGC Web Services (OWS). One of the main reasons for these challenges is associated with semantic heterog...
Article
Full-text available
We present an ontology that describes the domain of Public Transport by bus, which is common in cities around the world. This ontology is aligned to Transmodel, a reference model which is available as a UML specification and which was developed to foster interoperability of data about transport systems across Europe. The alignment with this non-ont...
Article
Full-text available
With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to...
Article
OneClass SVM is a popular method for unsupervised anomaly detection. As many other methods, it suffers from the black box problem: it is difficult to justify, in an intuitive and simple manner, why the decision frontier is identifying data points as anomalous or non anomalous. This problem is being widely addressed for supervised models. However, i...
Chapter
The ICT infrastructures of medium and large organisations that offer ICT services (infrastructure, platforms, software, applications, etc.) are becoming increasingly complex. Nowadays, these environments combine all sorts of hardware (e.g., CPUs, GPUs, storage elements, network equipment) and software (e.g., virtual machines, servers, microservices...
Poster
Full-text available
RDF-star was recently proposed as a convenient representation to annotate statements in RDF with metadata by introducing the so-called RDF-star triples, bridging the gap between RDF and property graphs. However, even though there are many solutions to generate RDF graphs, there is no systematic approach so far to generate RDF-star graphs from heter...
Article
Full-text available
Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have...
Preprint
Full-text available
A significant economic cost for many companies that operate with fleets of vehicles is related to their fuel consumption. This consumption can be reduced by acting over some aspects, such as the driving behaviour style of vehicle drivers. Improving driving behaviour (and other features) can save fuel on a fleet of vehicles without needing to change...
Article
Full-text available
Public procurement is a large market affecting almost every organisation and individual; therefore, governments need to ensure its efficiency, transparency, and accountability, while creating healthy, competitive, and vibrant economies. In this context, open data initiatives and integration of data from multiple sources across national borders coul...
Conference Paper
Full-text available
Knowledge graphs have proven to be a powerful technology to integrate and structure the myriad of data available nowadays. The semantic web community has actively worked on data integration systems, providing an important set of engines and mapping languages to facilitate the construction of knowledge graphs. Despite these important efforts, there...
Article
The adoption of Knowledge Graphs (KGs) by public and private organizations to integrate and publish data has increased in recent years. Ontologies play a crucial role in providing the structure for KGs, but are usually disregarded when designing Application Programming Interfaces (APIs) to enable browsing KGs in a developer-friendly manner. In this...
Article
Full-text available
Abstract. Purpose: Citizen Science – public participation in scientific projects – is becoming a global practice engaging volunteer participants, often non-scientists, with scientific research. Citizen Science is facing major challenges, such as quality and consistency, to reap open the full potential of its outputs and outcomes, including data,...
Article
Full-text available
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or...
Book
This book constitutes the refereed proceedings of the 18th International Semantic Web Conference, ESWC 2021, held virtually in June 2021. The 41 full papers and 2 short papers presented were carefully reviewed and selected from 167 submissions. The papers were submitted to three tracks: the research track, the resource track and the in-use track. T...
Chapter
Many organizations maintain knowledge graphs that are organized according to ontologies. However, these ontologies are implemented in languages (e.g. OWL) that are difficult to understand by users who are not familiar with knowledge representation techniques. In particular, this affects web developers who want to develop ontology-based applications...
Preprint
Full-text available
There are many scenarios where we may want to find pairs of textually similar documents in a large corpus (e.g. a researcher doing literature review, or an R&D project manager analyzing project proposals). To programmatically discover those connections can help experts to achieve those goals, but brute-force pairwise comparisons are not computation...
Preprint
Full-text available
With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform themati...
Chapter
Full-text available
The release of a growing amount of open procurement data means that we are increasingly able, and even have the obligation, to scrutinize and analyse public spending for delivering better quality of public services, optimizing costs, preventing fraud and corruption, and building healthy and sustainable economies. The TheyBuyForYou project addresses...
Preprint
Full-text available
In the absence of sufficient medication for COVID patients due to the increased demand, disused drugs have been employed or the doses of those available were modified by hospital pharmacists. Some evidences for the use of alternative drugs can be found in the existing scientific literature that could assist in such decisions. However, exploiting la...
Conference Paper
Full-text available
Skyline queries are being used in decision-making applications to help stakeholders find the set of data that satisfies certain criteria, whose weight may not be assigned beforehand. Given the wide availability of heterogeneous datasets that are being published following Open Data initiatives, combining skyline queries with query processing approac...
Conference Paper
Full-text available
Data has exponentially grown in the last years, and knowledge graphs constitute powerful formalisms to integrate a myriad of existing data sources. Transformation functions – specified with function-based mapping languages like FunUL and RML+FnO – can be applied to overcome interoperability issues across heterogeneous data sources. However, the abs...
Article
Full-text available
Local administrations generate large quantities of data due to the processes followed to attend administrative governance issues and the needs of its citizenry. Sadly, in most cases this data is not fully exploited and remains within the institutions, making their reutilization very difficult. Currently, open data initiatives have gained ground wor...
Chapter
Full-text available
Public procurement is a large market affecting almost every organisation and individual. Governments need to ensure efficiency, transparency, and accountability, while creating healthy, competitive, and vibrant economies. In this context, we built a platform, consisting of a set of modular APIs and ontologies to publish, curate, integrate, analyse,...
Conference Paper
Full-text available
Virtual knowledge graph access has traditionally focused on providing ontology-based access to relational databases (RDB) proposing SPARQL-to-SQL query translation techniques and optimizations. With the advent of mapping languages or annotations such as RML or CSVW, these techniques have been applied over tabular data by considering each source as...
Conference Paper
Full-text available
A skyline set corresponds to the points that are non-dominated by any other points in terms of a multi-criteria function, i.e, there is no other point with values better than them in the criteria defined in a multi-criteria function. Particularly, skyline queries can be used to filter the points that best meet a multi-criteria function in decision...
Chapter
Ontologies are widely used nowadays for many different purposes and in many different contexts, like industry and research, and in domains ranging from geosciences, biology, chemistry or medicine. When used for research, ontologies should be treated as other research artefacts, such as data, software, methods, etc.; following the same principles us...
Chapter
Full-text available
We carried out a literature survey on ontologies dealing with the scholarly and research domains, with a focus on modeling the knowledge graphs that would support information foraging by researchers within the different roles they fulfill during their career. We identified 43 relevant ontologies, of which 35 were found sufficiently documented to be...
Article
A lot of tabular data are being published on the Web. Semantic labeling of such data may help in their understanding and exploitation. However, many challenges need to be addressed to do this automatically. With numbers, it can be even harder due to the possible difference in measurement accuracy, rounding errors, and even the frequency of their ap...
Preprint
Full-text available
Citizen Science-public participation in scientific projects-is becoming a global practice engaging volunteer participants, often non-scientists, with scientific research. Citizen Science has already considerable support from various professional networks and policy makers, since it is effective in enabling a wider participation and access to increa...
Conference Paper
Full-text available
We carried out a literature survey on ontologies dealing with the scholarly and research domains, with focus on modeling the knowledge graphs that would support information foraging by researchers within the different roles they fulfill during their career. In the state of the art we identified 43 relevant ontologies, of which 34 were found suffici...
Preprint
Full-text available
Data has exponentially grown in the last years, and knowledge graphs constitute powerful formalisms to integrate a myriad of existing data sources. Transformation functions -- specified with function-based mapping languages like FunUL and RML+FnO -- can be applied to overcome interoperability issues across heterogeneous data sources. However, the a...
Conference Paper
Full-text available
Public procurement is a large market affecting almost every organisation and individual. Governments need to ensure efficiency, transparency, and accountability, while creating healthy, competitive, and vibrant economies. In this context, we built a platform, consisting of a set of modular APIs and ontologies to publish, curate, integrate, analyse,...
Article
Full-text available
A large number of datasets are being made available on the Web using a variety of formats and according to diverse data models. Ontology Based Data Integration (OBDI) has been traditionally proposed as a mechanism to facilitate access to such heterogeneous datasets, providing a unified view over their data by means of ontologies. Recently, the term...
Article
Full-text available
In the last decade, REST has become the most common approach to provide web services, yet it was not originally designed to handle typical modern applications (e.g. mobile apps). GraphQL was proposed to reduce the number of queries and data exchanged in comparison with REST. Since its release in 2015, it has gained momentum as an alternative approa...
Article
Full-text available
Public administrations handle large amounts of data in relation to their internal processes as well as to the services that they offer. Following public-sector information reuse regulations and worldwide open data publication trends, these administrations are increasingly publishing their data as open data. However, open data are often released wit...
Article
Full-text available
Ontology Based Data Access (OBDA) refers to a range of techniques, algorithms and systems that can be used to deal with the heterogeneity of data that is common inside many organisations as well as in inter-organisational settings and more openly on the Web. In OBDA, ontologies are used to provide a global view over multiple local datasets; and map...
Preprint
Full-text available
Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational database, CSV, JSON), either by materializing integrated data into RDF or by performing on-the-fly integration via SPARQL-to-SQL query translation. In the specific case of tabular datasets comprised of several CSV or E...
Article
Full-text available
Searching for similar documents and exploring major themes covered across groups of documents are common activities when browsing collections of scientific papers. This manual knowledge-intensive task can become less tedious and even lead to unexpected relevant findings if unsupervised algorithms are applied to help researchers. Most text mining al...
Preprint
Full-text available
Ontologies are widely used nowadays for many different purposes and in many different contexts, like industry and research, and in domains ranging from geosciences, biology, chemistry or medicine. When used for research, on-tologies should be treated as other research artefacts, such as data, software, methods , etc.; following the same principles...
Conference Paper
Full-text available
Bio2RDF is one of the most popular projects that integrates and publishes biomedical datasets as Linked Data. The community has actively contributed to the generation of these datasets using ad-hoc programmed scripts. In the context of the Semantic Web, Ontology-Based Data Access (OBDA) approaches have been proposed to provide data access and trans...
Preprint
Full-text available
Cross-lingual annotations of legislative texts enable us to explore major themes covered in multilingual legal data and are a key facilitator of semantic similarity when searching for similar documents. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform the...
Preprint
Full-text available
OneClass SVM is a popular method for unsupervised anomaly detection. As many other methods, it suffers from the black box problem: it is difficult to justify, in an intuitive and simple manner, why the decision frontier is identifying data points as anomalous or non anomalous. Such type of problem is being widely addressed for supervised models. Ho...
Conference Paper
Full-text available
The use of knowledge graphs is spreading in the scientific community across different domains, from social sciences to biomedicine. The creation of knowledge graphs usually needs the integration of multiple heterogeneous data sources in different formats and schemas. One common way to achieve this process is using declarative mappings, which establ...
Preprint
Full-text available
A large number of datasets are made publicly available on a wide range of formats. Due to interoperability problems, the construction of RDF-based knowledge graphs (KG) using declarative mapping languages has emerged with the aim of integrating heterogeneous sources in a uniform way. Although the scientific community has actively contributed with s...
Chapter
A large number of datasets are made publicly available on a wide range of formats. Due to interoperability problems, the construction of RDF-based knowledge graphs (KG) using declarative mapping languages has emerged with the aim of integrating heterogeneous sources in a uniform way. Although the scientific community has actively contributed with s...
Conference Paper
Full-text available
With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform themati...
Conference Paper
Full-text available
The release of a growing amount of open procurement data led to various initiatives for harmonising the data being provided. Among others, the Open Contracting Data Standard (OCDS) is highly relevant due to its high practical value and increasing traction. OCDS defines a common data model for publishing structured data throughout most of the stages...
Conference Paper
Full-text available
In this paper, we describe our implemented approach for the usage and exploitation of declarative mappings for the publication of open transport data from transport authorities and operators into an ontology based on Transmodel. This allows a homogeneous representation of transport data across EU transport-related organisations and minimises the ne...
Chapter
Full-text available
The release of a growing amount of open procurement data led to various initiatives for harmonising the data being provided. Among others, the Open Contracting Data Standard (OCDS) is highly relevant due to its high practical value and increasing traction. OCDS defines a common data model for publishing structured data throughout most of the stages...
Conference Paper
Full-text available
REST has become in the last decade the most common manner to provide web services, yet it was not originally designed to handle typical modern applications (e.g., mobile apps). GraphQL was released publicly in 2015 and since then has gained momentum as an alternative approach to REST. However, generating and maintaining GraphQL resolvers is not eas...
Conference Paper
Full-text available
The adoption of GraphQL is on the rise, many companies and institutes are adopting it due to its ease of use, ease of maintenance, and hide the complexity from the user. Such advantages come from using a unified global schema and mapping it to the underlying data sources. The semantic web community has already adopted a similar way to map different...
Chapter
Full-text available
Knowledge graphs are often generated using rules that apply semantic annotations to data sources. Software tools then execute these rules and generate or virtualize the corresponding RDF-based knowledge graph. RML is an extension of the W3C-recommended R2RML language, extending support from relational databases to other data sources, such as data i...
Chapter
Full-text available
Research is commonly used to measure the prestige of universities. Currently, many universities register some of their scientific production (such as thesis, articles) in open access repositories using technologies like DSpace or ePrints. Likewise, scientific production is available in different and overlapping databases such as Scopus, Web of Scie...
Article
Full-text available
Abstract We describe the process and tools that we have used to generate and publish the BTN100 Linked Dataset, based on the original data from the Spanish Topographic Base (1:100.000 scale) from the Spanish Instituto Geográfico Nacional. We have taken into account the limitations and lessons learned from our initial experience on the generation an...