
Albert Meroño-Peñuela- Vrije Universiteit Amsterdam
Albert Meroño-Peñuela
- Vrije Universiteit Amsterdam
About
69
Publications
6,657
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
520
Citations
Current institution
Publications
Publications (69)
Generative AI is radically changing the creative arts, by fundamentally transforming the way we create and interact with cultural artefacts. While offering unprecedented opportunities for artistic expression and commercialisation, this technology also raises ethical, societal, and legal concerns. Key among these are the potential displacement of hu...
Knowledge graphs (KGs) are essential in human-centered AI by reducing the need for extensive labeled machine-learning datasets, enhancing retrieval-augmented generation, and facilitating explanations. However, modern KG construction has evolved into a complex, semi-automated process, increasingly reliant on opaque deep-learning models and a multitu...
Knowledge Graphs (KGs) store human knowledge in the form of entities (nodes) and relations, and are used extensively in various applications. KG embeddings are an effective approach to addressing tasks like knowledge discovery, link prediction, and reasoning. This is often done by allocating and learning embedding tables for all or a subset of the...
In this paper, we provide a technical vision for key enabling elements for the architecture of the UK National Data Library (NDL) with a strong focus on building it as an AI-ready data infrastructure through standardised vocabularies, automated analysis tools, and interoperability services. We follow the ODI Multilayer Interoperability Framework (M...
Past ontology requirements engineering (ORE) has primarily relied on manual methods, such as interviews and collaborative forums, to gather user requirements from domain experts, especially in large projects. Current OntoChat offers a framework for ORE that utilises large language models (LLMs) to streamline the process through four key functions:...
Past ontology requirements engineering (ORE) has primarily relied on manual methods, such as interviews and collaborative forums, to gather user requirements from domain experts, especially in large projects. Current OntoChat offers a framework for ORE that utilises large language models (LLMs) to streamline the process through four key functions:...
Despite many advances in knowledge engineering (KE), challenges remain in areas such as engineering knowledge graphs (KGs) at scale, keeping up with evolving domain knowledge, multilingualism, and multimodality. Recently, KE has used LLMs to support semi-automatic tasks, but the most effective use of LLMs to support knowledge engineers across the K...
We study collaboration patterns of Wikidata, one of the world's largest collaborative knowledge graph communities. Wikidata lacks long-term engagement with a small group of priceless members, 0.8%, to be responsible for 80% of contributions. Therefore, it is essential to investigate their behavioural patterns and find ways to enhance their contribu...
In the music domain, several ontologies have been proposed to annotate musical data, in both symbolic and audio form, and generate semantically rich Music Knowledge Graphs. However, current models lack interoperability and are insufficient for representing music history and the cultural heritage context in which it was generated; risking the propag...
Various disconnected chord datasets are currently available for music analysis and information retrieval, but they are often limited by either their size, non-openness, lack of timed information, and interoperability. Together with the lack of overlapping repertoire coverage, this limits cross-corpus studies on harmony over time and across genres,...
Computationally creative systems for music have recently achieved impressive results, fuelled by progress in generative machine learning. However, black-box approaches have raised fundamental concerns for ethics, accountability, explainability, and musical plausibility. To enable trustworthy machine creativity, we introduce the Harmonic Memory, a K...
The annotation of music content is a complex process to represent due to its inherent multifaceted, subjectivity, and interdisciplinary nature. Numerous systems and conventions for annotating music have been developed as independent standards over the past decades. Little has been done to make them interoperable, which jeopardises cross-corpora stu...
The annotation of music content is a complex process to represent due to its inherent multifaceted, subjectivity, and interdisciplinary nature. Numerous systems and conventions for annotating music have been developed as independent standards over the past decades. Little has been done to make them interoperable, which jeopardises cross-corpora stu...
Current work on multi-agent systems at King’s College London is extensive, though largely based in two research groups within the Department of Informatics: the Distributed Artificial Intelligence (DAI) thematic group and the Reasoning & Planning (RAP) thematic group. DAI combines AI expertise with political and economic theories and data, to explo...
The success of Semantic Web technology has boosted the publication of Knowledge Graphs in the Web of Data, and several technologies to access them have become available covering different spots in the spectrum of expressivity: from the highly expressive SPARQL to the controlled access of Linked Data APIs, with GraphQL in between. Many of these tech...
Sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have discussed the data models that Knowledge Graphs commonly use for representing sequences and showed how these models have an impact on query performance and tha...
An important problem in large symbolic music collections is the low availability of high-quality metadata, which is essential for various information retrieval tasks. Traditionally, systems have addressed this by relying either on costly human annotations or on rule-based systems at a limited scale. Recently, embedding strategies have been exploite...
A century ago, the 1918–19 influenza pandemic swept across the globe, taking the lives of over 50 million people. We use data from the Dutch civil registry to show which regions in the Netherlands were most affected by the 1918–19 pandemic. We do so for the entire 1918 year as well as the first, second, and third wave that hit the Netherlands in su...
In Section 3.3, we discussed some of the limitations typically encountered when building Knowledge Graph APIs. In summary, we discussed the following issues:
repetitive and unsystematic use of queries;
lack of separation of concerns (query management);
lack of transparency (query management); and
difficulty of versioning (query management).
When interacting with Knowledge Graphs, applications deal with two complementary items: data requests and data responses. So far, we have discussed the first item, but this chapter focuses instead on the data response. We point out the limitations of standard formats and identify development needs, proposing a technological solution.
This first chapter introduces the concepts of Knowledge Graphs and Linked Data, and the technical specifications of RDF, and SPARQL as paradigmatic implementations of these concepts. These technologies have gained a lot of traction in recent years in both academia and industry, as shown by large Knowledge Graphs like Wikidata and DBpedia, and those...
What have we learned in this book? In Chapter 1, we briefly presented Knowledge Graphs as a paradigm for publishing structured and semantically rich data on the Web, and query languages (in particular SPARQL) as a means to access the wealth of information in those Knowledge Graphs. We also pointed out some of the challenges that developers face whe...
In Chapter 2 we saw how to query Knowledge Graphs with SPARQL from within application code, either directly using HTTP, or by resorting to one of the various SPARQL libraries available for programming languages. This has provided us with some additional tools, on top of just manually writing and executing queries, toward their generalization and pa...
In Chapter 1 we mentioned some technologies behind Knowledge Graphs (RDF), and the query languages available for accessing their data (SPARQL). These languages use graph-based patterns to specify the criteria under which we want to query the data. Usually, these queries are manually written, tested through trial and error, and incrementally refined...
The tools and principles that we have explained in this book, in particular those that facilitated the creation of the tools grlc (described in Chapter 4) and SPARQL Transformer (described in Chapter 5), originated in research programs around Knowledge Graphs and the Semantic Web, in particular the Dutch national program CLARIAH1 [Merono-Penuela et...
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework. Integrating data from diverse humanities domain...
This open access book constitutes the refereed proceedings of the 16th International Conference on Semantic Systems, SEMANTiCS 2020, held in Amsterdam, The Netherlands, in September 2020. The conference was held virtually due to the COVID-19 pandemic.
The proceedings are available as open access volume under a CC-By license. You can download it he...
The Semantic Web community has produced a large body of literature that is becoming increasingly difficult to manage, browse, and use. Recent work on attention-based, sequence-to-sequence Transformer neural architecture has produced language models that generate surprisingly convincing synthetic conditional text samples. In this demonstration, we r...
Finding and linking different appearances of the same entity in an open Web setting is one of the primary challenges of the Semantic Web. In social and economic history, record linkage has dealt with this problem for a long time, linking historical individual records at a local database level. With the advent of semantic technologies, Knowledge Gra...
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable,...
In a document-based world as the one of Web APIs, the triple-based output of SPARQL endpoints can be a barrier for developers who want to integrate Linked Data in their applications. A different JSON output can be obtained with SPARQL Transformer, which relies on a single JSON object for defining which data should be extracted from the endpoint and...
Linked lists represent a countable number of ordered values, and are among the most important abstract data types in computer science. With the advent of RDF as a highly expressive knowledge representation language for the Web, various implementations for RDF lists have been proposed. Yet, there is no benchmark so far dedicated to evaluate the perf...
The Linked Data paradigm has been used to publish a large number of musical datasets and ontologies on the Semantic Web, such as MusicBrainz, AcousticBrainz, and the Music Ontology. Recently, the MIDI Linked Data Cloud has been added to these datasets, representing more than 300,000 pieces in MIDI format as Linked Data, opening up the possibility f...
Nanopublications are a Linked Data format for scholarly data publishing that has received considerable uptake in the last few years. In contrast to the common Linked Data publishing practice, nanopublications work at the granular level of atomic information snippets and provide a consistent container format to attach provenance and metadata at this...
The Dutch Historical Censuses (1795–1971) contain statistics that describe almost two centuries of History in the Netherlands. These censuses were conducted once every 10 years (with some exceptions) from 1795 to 1971. Researchers have used its wealth of demographic, occupational, and housing information to answer fundamental questions in social ec...
The main promise of the digital humanities is the ability to perform scholarly studies at a much broader scale, and in a much more reusable fashion. The key enabler for such studies is the availability of sufficiently well described data. For the field of socio-economic history, data usually comes in a tabular form. Existing efforts to curate and p...
In this demo, we show how an effective and application agnostic way of curating SPARQL queries can be achieved by leveraging Git-based architectures. Often, SPARQL queries are hard-coded into Linked Data consuming applications. This tight coupling poses issues in code maintainability, since these queries are prone to change to adapt to new situatio...
Despite the advatages of Linked Data as a data integration paradigm, accessing and consuming Linked Data is still a cumbersome task. Linked Data applications need to use technologies such as RDF and SPARQL that, despite their expressive power, belong to the data integration stack. As a result, applications and data cannot be cleanly separated: SPAR...
The study of music is highly interdisciplinary, and thus requires the combination of datasets from multiple musical domains, such as catalog metadata (authors, song titles, dates), industrial records (labels, producers, sales), and music notation (scores). While today an abundance of music metadata exists on the Linked Open Data cloud, linked datas...
Key fields in the humanities, such as history, art and language, are central to a major transformation that is changing scholarly practice in these fields: the so-called Digital Humanities (DH). A fundamental question in DH is how humanities datasets can be represented digitally, in such a way that machines can process them, understand their meanin...
Concept drift refers to the phenomenon that concepts change their intensional composition, and therefore meaning, over time. It is a manifestation of content dynamics, and an important problem with regard to access and scalability in the Web of Data. Such drifts go back to contextual influences due to social embedding as suggested by e.g. topic ana...
Here, we describe the CEDAR dataset, a five-star Linked Open Data representation of the Dutch historical censuses. These were conducted in the Netherlands once every 10 years from 1795 to 1971. We produce a linked dataset from a digitized sample of 2,288 tables. It contains more than 6.8 million statistical observations about the demography, labour...
The main promise of the digital humanities is the ability to perform scholar studies at a much broader scale, and in a much more reusable fashion. The key enabler for such studies is the availability of sufficiently well described data. For the field of socioeconomic history, data usually comes in a tabular form. Existing efforts to curate and publ...
In this demo, we explore the potential of RDF as a representation format for digital music. Digital music is broadly used today in many professional music production environments. For decades, MIDI (Musical Instrument Digital Interface) has been the standard for digital music exchange between musicians and devices, albeit not in a Web friendly way....
Building Web APIs on top of SPARQL endpoints is becoming common practice. It enables universal access to the integration favorable data space of Linked Data. In the majority of use cases, users cannot be expected to learn SPARQL to query this data space. Web APIs are the most common way to enable programmatic access to data on the Web. However, the...
Historical censuses have an enormous potential for research. In order to fully use this potential, harmonization of these censuses is essential. During the last decades, enormous efforts have been undertaken in digitizing the published aggregated outcomes of the Dutch historical censuses (1795-1971). Although the accessibility has been improved eno...
The Semantic Web is built on top of Knowledge Organization Systems (KOS)
(vocabularies, ontologies, concept schemes) that provide a structured,
interoperable and distributed access to Linked Data on the Web. The maintenance
of these KOS over time has produced a number of KOS version chains: subsequent
unique version identifiers to unique states of...
Datasets that represent historical sources are relative newcomers in the Linked Open Data (LOD) cloud. Following the standard LOD practices for publishing historical sources raises several questions: how can we distinguish between RDF graphs of primary and secondary sources? Should we treat archived and online RDF graphs differently in historical r...
During the nineties of the last century, historians and computer scientists created together a research agenda around the life cycle of historical information. It comprised the tasks of creation, design, enrichment, editing, retrieval, analysis and presentation of historical information with help of information technology. They also identified a nu...
This paper discusses the use of semantic technologies to increase quality, machine-processability, format translatability and cross-querying of complex tabular datasets. Our interest is to enable longitudinal studies of social processes in the past, and we use the historical Dutch censuses as case-study. Census data is notoriously difficult to comp...
This paper discusses the use of Linked Data to harmonize the Dutch censuses (1795-1971). Due to the long period they cover, census data is notoriously difficult to compare, aggregate and query in a uniform fashion. In social history, harmonization is the (manual) process of restructuring, interpreting and correcting original data sources to make a...
The Internet and Information Systems evolution have dra-matically increased the amount of information hold by gov-ernments and companies. This information can be very sen-sitive, specially regarding personal data, so governments and industries promote acts and guidelines in order to ensure pri-vacy and data security. Thus, companies have to conside...
This paper describes the analysis of the requirements and the knowledge acquisition process for the development of a legal ontology for the representation of data protection knowledge in the framework of the NEURONA project. This modular ontology is used in the NEURONA application to reason about the correctness of the measures of protection applie...