Guus Schreiber

Guus Schreiber
Vrije Universiteit Amsterdam | VU · Department of Computer Science

PhD

About

245
Publications
45,660
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,313
Citations

Publications

Publications (245)
Article
With the increase of cultural heritage data published online, the usefulness of data in this open context hinges on the quality and diversity of descriptions of collection objects. In many cases, existing descriptions are not sufficient for retrieval and research tasks, resulting in the need for more specific annotations. However, eliciting such an...
Article
An increasing number of cultural heritage institutions publish data online. Ontologies can be used to structure published data, thereby increasing interoperability. To achieve widespread adoption of ontologies, institutions such as libraries, archives and museums have to be able to assess whether an ontology can adequately capture information about...
Article
Full-text available
This paper describes BiographyNet, a digital humanities project (2012-2016) that brings together researchers from history, computational linguistics and computer science. The project uses data from the Biography Portal of the Netherlands (BPN), which contains approximately 125,000 biographies from a variety of Dutch biographical dictionaries from t...
Article
Full-text available
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
Conference Paper
In the digital era, personalisation systems are the typical way to deal with the massive amount of information on the Web. ese systems decide in our place what we like, possibly hiding us away from a complete world of potentially interesting content. ese systems do not challenge us to open our horizons of interest, trap- ping us more and more in ou...
Article
In this paper we propose several approaches for automatic annotation of natural science spreadsheets using a combination of structural properties of the tables and external vocabularies. During the design process of their spreadsheets, domain scientists implicitly include their domain model in the content and structure of the spreadsheet tables. Ho...
Article
Full-text available
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Article
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Article
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible...
Article
It is possible to automatically annotate a natural science spreadsheet using lexical matching, given that the tables in these spreadsheets meet a number of requirements regarding the content. Results of a survey show that most of the existing natural science spreadsheets deviate from the ideal situation. We propose to complement lexical matching wi...
Conference Paper
Content Management Systems haven’t gained much from the Linked Data uptake, and sharing content between different websites and systems is hard. On the other side, using Linked Data in web documents is not as trivial as managing regular web content using a CMS. To address these issues, we present a method for creating human readable web documents ou...
Conference Paper
Spreadsheets models are frequently used by scientists to analyze research data. These models are typically described in a paper or a report, which serves as single source of information on the underlying research project. As the calculation workflow in these models is not made explicit, readers are not able to fully understand how the research resu...
Conference Paper
In this study we consider wether, and to what extent, additional semantics in the form of Linked Data can help diversifying search results. We undertake this study in the domain of cultural heritage. The data consists of collection data of the Rijksmuseum Amsterdam together with a number of relevant external vocabularies, which are all published as...
Conference Paper
Full-text available
Linking historical datasets and making them available on the Web has increasingly become a subject of research in the field of digital humanities. In this paper, we focus on discovering links between ships from a dataset of Dutch maritime events and a historical archive of newspaper articles. We apply a heuristic-based method for finding and filter...
Conference Paper
Full-text available
In this paper we present an experiment which has been performed to validate a pragmatic-based, expert-based and basic-level ontology. These ontologies were created for use in an application which generates questions for ordinary people with the purpose to determine a crisis situation. All three ontologies have specific characteristics related to th...
Conference Paper
Full-text available
Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We...
Article
Using an ontology to automatically generate questions for ordinary people requires a structure and concepts compliant with human thought. Here we present methods to develop a pragmatic, expert-based and a basic-level ontology and a framework to evaluate these ontologies. Comparing these ontologies shows that expert-based ontologies are most easy to...
Chapter
Full-text available
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first- or second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multinomial models, as well as how take into account pos...
Conference Paper
Full-text available
Public authorities are increasingly sharing sets of open data. These data are often preprocessed (e.g. smoothened, aggregated) to avoid to expose sensible data, while trying to preserve their reliability. We present two procedures for tackling the lack of methods for measuring the open data reliability. The first procedure is based on a comparison...
Conference Paper
In a recent approach, Baader and Distel proposed an algorithm to axiomatize all terminological knowledge that is valid in a given data set and is expressible in the description logic ELK. This approach is based on the mathematical theory of formal concept ...
Article
Full-text available
In this paper, we present an evaluation framework for online access to cultural heritage. The framework enables the assessment of online cultural heritage applications in terms of their provision and support of information and interpretation. It is anchored in digital hermeneutics: the study and theory of the Web as a vehicle of (self)-interpretati...
Article
Full-text available
Simple Knowledge Organization System (SKOS) provides a data model and vocabulary for expressing Knowledge Organization Systems (KOSs) such as thesauri and classi?cation schemes in Semantic Web applications. This paper presents the main components of SKOS and their formal expression in Web Ontology Language (OWL), providing an extensive account of t...
Conference Paper
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. This trend is driven by the assumption that user tags will improve video search. In this paper we study whether this is indeed the case. To this end, we create an evaluation dataset that consists of: (i) a set of vide...
Article
Knowledge-acquisition research started in the eighties as a small research community focusing on knowledge-intensive problems in relatively small domains. In this paper we look at the influence the Web has had on knowledge acquisition and vice versa. To this end we discuss in some depth four topics, namely the ontology language OWL, the vocabulary...
Conference Paper
Full-text available
Diversity and profundity of the topics in cultural heritage collections make experts from outside the institution indispensable for acquiring qualitative and comprehensive annotations. We define the con- cept of nichesourcing and present challenges in the process of obtain- ing qualitative annotations from people in these niches. We believe that ex...
Article
Environmental computer models are considered essential tools in supporting environmental decision making, but their main value is that they allow a better understanding of our complex environment. Despite numerous attempts to promote good modelling practice, transparency of current environmental computer models is limited, which hinders progress in...
Article
Full-text available
In this document we describe the Amsterdam Museum Linked Open Data set. The dataset is a five-star Linked Data representation and comprises the entire collection of the Amsterdam Museum consisting of more than 70,000 object descriptions. Furthermore, the institution's thesaurus and person authority files used in the object metadata are included in...
Article
Full-text available
The BiographyNet project aims at inspiring historians when setting up new research projects. The goal is to create a semantic knowledge base by extracting links between people, historic events, places and time periods from a variety of Dutch biographical dictionaries. A demonstrator will be developed providing visualization and browsing techniques...
Conference Paper
In this position paper we identify nichesourcing, a specific form of human-based computation that harnesses the computational efforts from niche groups rather than the "faceless crowd". We claim that nichesourcing combine the strengths of the crowd with those of professionals, optimizing the result of human-based computation for certain tasks. We i...
Chapter
Full-text available
This chapter explores methods for determining the reliability of Auto-mated Identification System (AIS) messages. The primary use of AIS messages in the naval domain is to avoid collisions, therefore they contain kinematic informa-tion about ships. Moreover, AIS messages contain information like the ship name and its identifiers, so AIS messages ca...
Conference Paper
Full-text available
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperabil...
Article
Full-text available
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can ll this gap by representing knowledge about the data at dierent level of abstraction. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about...
Article
Full-text available
In this article we discuss an application scenario for semantic annotation and search in a collection of art images. This application shows that background knowledge in the form of ontologies can be used to support indexing and search in image collections. The underlying ontologies are represented in RDF Schema and are based on existing data standa...
Article
Events have become central elements in the representation of data from domains such as history, cultural heritage, multimedia and geography. The Simple Event Model (SEM) is created to model events in these various domains, without making assumptions about the domain-specific vocabularies used. SEM is designed with a minimum of semantic commitment t...
Article
Full-text available
Cultural heritage institutions are currently rethinking ac-cess to their collections to allow the public to interpret and contribute to their collections. In this work, we present the Agora project, an interdisciplinary project in which Web technology and theory of interpretation meet. This we call digital hermeneutics. The Agora project facilitate...
Conference Paper
Full-text available
We present a Linked Data analysis method which relies on knowledge patterns for constructing a logical architecture of the knowledge in a dataset. This can then be exploited to compare heterogeneous datasets, enhance interoperability between them and make implicit knowledge emerge.
Conference Paper
Full-text available
Gold standard mappings created by experts are at the core of alignment evaluation. At the same time, the process of manual evaluation is rarely discussed. While the practice of having multiple raters evaluate results is accepted, their level of agreement is often not measured. In this paper we describe three experiments in manual evaluation and stu...
Conference Paper
Full-text available
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the Open Government Data initiativ...
Conference Paper
Full-text available
Within cultural heritage collections, objects are often grounded in a particular historical setting. This setting can currently not be made explicit, as structured descriptions of events are either missing or not marked up explicitly. This paper reports a study on automatic extraction of an historical event thesaurus from unstructured texts. We sho...
Conference Paper
Full-text available
In recent years, crowdsourcing has gained attention as an alternative method for collecting video annotations. An example is the internet video labeling game Waisda? launched by the Netherlands Institute for Sound and Vision. The goal of this PhD research is to investigate the value of the user tags collected with this video labeling game. To this...
Conference Paper
Full-text available
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multonimal models, as well as how take into account possi...
Article
Most digitised and online available objects from GLAMs (Galleries, Libraries, Archives, Museums) can be browsed through a predefined set of formal metadata, such as its creator, year of creation, and type of material. Standards for metadata management and exchange have matured and are being adopted widely. They enable intra-collection search and ex...
Article
Recent research has shown the Linked Data cloud to be a potentially ideal basis for improving user experience when interacting with Web content across different applications and domains. Using the explicit knowledge of datasets, however, is neither sufficient nor straight-forward. Dataset knowledge is often not uniformly organized, thus it is gener...
Conference Paper
Full-text available
In this paper, we define reusable inference steps for content-based recommender systems based on semantically-enriched collections. We show an instantiation in the case of recommending artworks and concepts based on a museum domain ontology and a user profile consisting of rated artworks and rated concepts. The recommendation task is split into fo...
Article
Full-text available
It is common practice in audiovisual archives to disclose documents using metadata from a structured vocabulary or thesaurus. Many of these thesauri have limited or no structure. The objective of this paper is to find out whether retrieval of audiovisual resources from a collection indexed with an in-house thesaurus can be improved by enriching the...
Conference Paper
Full-text available
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vo- cabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that rst gener-...
Article
Abstract Interpretation of spatial features often requires combined reasoning over geometry and semantics. We introduce the Space package, an open source SWI-Prolog extension that provides spatial indexing capabilities. Together with the existing semantic web reasoning capabilities of SWI-Prolog, this allows efficient integration of spatial and sem...
Conference Paper
Full-text available
Web Science studies the interpay between web technology and the human behaviour it induces at the micro, meso and macro level. In this extended abstract we examine Web Science research issues by taking a closer look at the area of digital heritage. We discuss engineering, communication and socio-economic aspects.
Article
Full-text available
This paper describes the "food task" of the Ontology Alignment Evaluation Initiative (OAEI) 2006 and 2007. The OAEI** is a comparative evaluation effort to measure the quality of automatic ontology-alignment systems. The food task focuses on the alignment of thesauri in the agricultural domain. It aims at providing a realistic task for ontology-ali...
Article
Full-text available
Traditionally the relations between concepts from a controlled vocabulary, such as the hierarchical and associative relations in a thesaurus, have been used to support users in their search process. In the context of the Semantic Web, multiple interlinked vocabularies are becoming available, providing a large number of different relations between c...
Article
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vocabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that first genera...
Article
This paper presents research in the context of two multidisciplinary projects aimed at facilitating the history domain with an automatic approach for event extraction and modelling. To realise this, the Semantics of History project is providing a historical ontology and a lexicon to support the detection of historical events in textual data whilst...
Article
Full-text available
Audiovisual material is a vital component of the world's heritage but it remains difficult to access. With the Netherlands Institute for Sound and Vision as one of its partners, the MuNCH project aims to investigate new methods for improving access to a wide range of audiovisual documents. MuNCH brings together three research fields: multimedia ana...
Article
Full-text available
Web 2.0 — the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability — enables increasing access to digital collections of museums. The expectation is that more and more people will spend time preparing their visit before actually visiting the museum and look for related inf...
Article
Full-text available
In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter-annotator...
Article
Full-text available
The Documentalist Support System (DocSS) is developed to suite novel needs of documentalists working within the Dutch archive for Sound and Vision, broadcasters working outside of Sound and Vision and people interested in the Cultural Heritage value of the archive, who want to perform search in context. The documentalists (and to some extent the ot...
Article
Full-text available
Web 2.0 -the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability enables increasing access to digital collections of museums. The expectation is that more and more, people will spend time preparing their visit before actually visiting the museum and look for related infor...
Article
Full-text available
A method that uses natural language processing techniques and background knowledge in the form of structured vocabularies to automatically identify concepts and their roles from text description is presented. Annotation method using the ARIA collection from Rijksmuseum Amsterdam was evaluated by comparing it to a human-created gold standard and com...
Conference Paper
Full-text available
Identifying alignments between vocabularies has become a central knowledge engineering activity. A plethora of alignment techniques has been developed over the past years. In this paper we present a case study in which we examine and evaluate the practical use of three typical alignment techniques. The study involves the alignment of two vocabulari...
Conference Paper
Full-text available
Semantic desktop environments aim at improving the effectiveness and efficiency of users carrying out daily tasks within their personal information management (PIM) infrastructure. They support the user by transferring and exploiting the explicit semantics ...
Article
Full-text available
This demo shows the integration of spatial and semantic reasoning for the recognition of ship behavior. We recog-nize abstract behavior such as \ferry trip" and derive that the ship showing this behavior is a \ferry". We accomplish this by abstracting over low-level ship trajectory data and applying Prolog rules that express properties of ship beha...
Article
Full-text available
Metadata vocabularies provide various semantic rela-tions between concepts. For content-based recommender systems, these relations enable a wide range of concepts to be recommended. However, not all semantically re-lated concepts are interesting for end users. In this pa-per, we identified a number of semantic relations, which are both within one v...