Guus Schreiber

Guus Schreiber
  • PhD
  • Head of Department at Vrije Universiteit Amsterdam

About

246
Publications
60,874
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,092
Citations
Current institution
Vrije Universiteit Amsterdam
Current position
  • Head of Department

Publications

Publications (246)
Article
With the increase of cultural heritage data published online, the usefulness of data in this open context hinges on the quality and diversity of descriptions of collection objects. In many cases, existing descriptions are not sufficient for retrieval and research tasks, resulting in the need for more specific annotations. However, eliciting such an...
Article
An increasing number of cultural heritage institutions publish data online. Ontologies can be used to structure published data, thereby increasing interoperability. To achieve widespread adoption of ontologies, institutions such as libraries, archives and museums have to be able to assess whether an ontology can adequately capture information about...
Article
Full-text available
This paper describes BiographyNet, a digital humanities project (2012-2016) that brings together researchers from history, computational linguistics and computer science. The project uses data from the Biography Portal of the Netherlands (BPN), which contains approximately 125,000 biographies from a variety of Dutch biographical dictionaries from t...
Article
Full-text available
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
Preprint
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
Conference Paper
In the digital era, personalisation systems are the typical way to deal with the massive amount of information on the Web. ese systems decide in our place what we like, possibly hiding us away from a complete world of potentially interesting content. ese systems do not challenge us to open our horizons of interest, trap- ping us more and more in ou...
Article
In this paper we propose several approaches for automatic annotation of natural science spreadsheets using a combination of structural properties of the tables and external vocabularies. During the design process of their spreadsheets, domain scientists implicitly include their domain model in the content and structure of the spreadsheet tables. Ho...
Article
Full-text available
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Article
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Article
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible...
Article
It is possible to automatically annotate a natural science spreadsheet using lexical matching, given that the tables in these spreadsheets meet a number of requirements regarding the content. Results of a survey show that most of the existing natural science spreadsheets deviate from the ideal situation. We propose to complement lexical matching wi...
Conference Paper
Content Management Systems haven’t gained much from the Linked Data uptake, and sharing content between different websites and systems is hard. On the other side, using Linked Data in web documents is not as trivial as managing regular web content using a CMS. To address these issues, we present a method for creating human readable web documents ou...
Conference Paper
Spreadsheets models are frequently used by scientists to analyze research data. These models are typically described in a paper or a report, which serves as single source of information on the underlying research project. As the calculation workflow in these models is not made explicit, readers are not able to fully understand how the research resu...
Chapter
Full-text available
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first- or second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multinomial models, as well as how take into account pos...
Conference Paper
In this study we consider wether, and to what extent, additional semantics in the form of Linked Data can help diversifying search results. We undertake this study in the domain of cultural heritage. The data consists of collection data of the Rijksmuseum Amsterdam together with a number of relevant external vocabularies, which are all published as...
Conference Paper
Full-text available
Linking historical datasets and making them available on the Web has increasingly become a subject of research in the field of digital humanities. In this paper, we focus on discovering links between ships from a dataset of Dutch maritime events and a historical archive of newspaper articles. We apply a heuristic-based method for finding and filter...
Conference Paper
Full-text available
In this paper we present an experiment which has been performed to validate a pragmatic-based, expert-based and basic-level ontology. These ontologies were created for use in an application which generates questions for ordinary people with the purpose to determine a crisis situation. All three ontologies have specific characteristics related to th...
Conference Paper
Full-text available
Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We...
Article
Using an ontology to automatically generate questions for ordinary people requires a structure and concepts compliant with human thought. Here we present methods to develop a pragmatic, expert-based and a basic-level ontology and a framework to evaluate these ontologies. Comparing these ontologies shows that expert-based ontologies are most easy to...
Conference Paper
Full-text available
Public authorities are increasingly sharing sets of open data. These data are often preprocessed (e.g. smoothened, aggregated) to avoid to expose sensible data, while trying to preserve their reliability. We present two procedures for tackling the lack of methods for measuring the open data reliability. The first procedure is based on a comparison...
Conference Paper
In a recent approach, Baader and Distel proposed an algorithm to axiomatize all terminological knowledge that is valid in a given data set and is expressible in the description logic ELK. This approach is based on the mathematical theory of formal concept ...
Article
In this paper, we present an evaluation framework for online access to cultural heritage. The framework enables the assessment of online cultural heritage applications in terms of their provision and support of information and interpretation. It is anchored in digital hermeneutics: the study and theory of the Web as a vehicle of (self)-interpretati...
Article
Full-text available
Simple Knowledge Organization System (SKOS) provides a data model and vocabulary for expressing Knowledge Organization Systems (KOSs) such as thesauri and classi?cation schemes in Semantic Web applications. This paper presents the main components of SKOS and their formal expression in Web Ontology Language (OWL), providing an extensive account of t...
Conference Paper
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. This trend is driven by the assumption that user tags will improve video search. In this paper we study whether this is indeed the case. To this end, we create an evaluation dataset that consists of: (i) a set of vide...
Article
Knowledge-acquisition research started in the eighties as a small research community focusing on knowledge-intensive problems in relatively small domains. In this paper we look at the influence the Web has had on knowledge acquisition and vice versa. To this end we discuss in some depth four topics, namely the ontology language OWL, the vocabulary...
Conference Paper
Full-text available
Diversity and profundity of the topics in cultural heritage collections make experts from outside the institution indispensable for acquiring qualitative and comprehensive annotations. We define the con- cept of nichesourcing and present challenges in the process of obtain- ing qualitative annotations from people in these niches. We believe that ex...
Article
Environmental computer models are considered essential tools in supporting environmental decision making, but their main value is that they allow a better understanding of our complex environment. Despite numerous attempts to promote good modelling practice, transparency of current environmental computer models is limited, which hinders progress in...
Article
Full-text available
In this document we describe the Amsterdam Museum Linked Open Data set. The dataset is a five-star Linked Data representation and comprises the entire collection of the Amsterdam Museum consisting of more than 70,000 object descriptions. Furthermore, the institution's thesaurus and person authority files used in the object metadata are included in...
Article
Full-text available
The BiographyNet project aims at inspiring historians when setting up new research projects. The goal is to create a semantic knowledge base by extracting links between people, historic events, places and time periods from a variety of Dutch biographical dictionaries. A demonstrator will be developed providing visualization and browsing techniques...
Conference Paper
In this position paper we identify nichesourcing, a specific form of human-based computation that harnesses the computational efforts from niche groups rather than the "faceless crowd". We claim that nichesourcing combine the strengths of the crowd with those of professionals, optimizing the result of human-based computation for certain tasks. We i...
Chapter
Full-text available
This chapter explores methods for determining the reliability of Auto-mated Identification System (AIS) messages. The primary use of AIS messages in the naval domain is to avoid collisions, therefore they contain kinematic informa-tion about ships. Moreover, AIS messages contain information like the ship name and its identifiers, so AIS messages ca...
Conference Paper
Full-text available
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperabil...
Article
Full-text available
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can ll this gap by representing knowledge about the data at dierent level of abstraction. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about...
Article
Full-text available
In this article we discuss an application scenario for semantic annotation and search in a collection of art images. This application shows that background knowledge in the form of ontologies can be used to support indexing and search in image collections. The underlying ontologies are represented in RDF Schema and are based on existing data standa...
Article
Events have become central elements in the representation of data from domains such as history, cultural heritage, multimedia and geography. The Simple Event Model (SEM) is created to model events in these various domains, without making assumptions about the domain-specific vocabularies used. SEM is designed with a minimum of semantic commitment t...
Conference Paper
Full-text available
We present a Linked Data analysis method which relies on knowledge patterns for constructing a logical architecture of the knowledge in a dataset. This can then be exploited to compare heterogeneous datasets, enhance interoperability between them and make implicit knowledge emerge.
Conference Paper
Full-text available
Gold standard mappings created by experts are at the core of alignment evaluation. At the same time, the process of manual evaluation is rarely discussed. While the practice of having multiple raters evaluate results is accepted, their level of agreement is often not measured. In this paper we describe three experiments in manual evaluation and stu...
Conference Paper
Full-text available
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the Open Government Data initiativ...
Conference Paper
Full-text available
Within cultural heritage collections, objects are often grounded in a particular historical setting. This setting can currently not be made explicit, as structured descriptions of events are either missing or not marked up explicitly. This paper reports a study on automatic extraction of an historical event thesaurus from unstructured texts. We sho...
Conference Paper
Full-text available
In recent years, crowdsourcing has gained attention as an alternative method for collecting video annotations. An example is the internet video labeling game Waisda? launched by the Netherlands Institute for Sound and Vision. The goal of this PhD research is to investigate the value of the user tags collected with this video labeling game. To this...
Article
Full-text available
Cultural heritage institutions are currently rethinking ac-cess to their collections to allow the public to interpret and contribute to their collections. In this work, we present the Agora project, an interdisciplinary project in which Web technology and theory of interpretation meet. This we call digital hermeneutics. The Agora project facilitate...
Conference Paper
Full-text available
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multonimal models, as well as how take into account possi...
Article
Most digitised and online available objects from GLAMs (Galleries, Libraries, Archives, Museums) can be browsed through a predefined set of formal metadata, such as its creator, year of creation, and type of material. Standards for metadata management and exchange have matured and are being adopted widely. They enable intra-collection search and ex...
Article
Recent research has shown the Linked Data cloud to be a potentially ideal basis for improving user experience when interacting with Web content across different applications and domains. Using the explicit knowledge of datasets, however, is neither sufficient nor straight-forward. Dataset knowledge is often not uniformly organized, thus it is gener...
Conference Paper
Full-text available
In this paper, we define reusable inference steps for content-based recommender systems based on semantically-enriched collections. We show an instantiation in the case of recommending artworks and concepts based on a museum domain ontology and a user profile consisting of rated artworks and rated concepts. The recommendation task is split into fo...
Article
Full-text available
It is common practice in audiovisual archives to disclose documents using metadata from a structured vocabulary or thesaurus. Many of these thesauri have limited or no structure. The objective of this paper is to find out whether retrieval of audiovisual resources from a collection indexed with an in-house thesaurus can be improved by enriching the...
Conference Paper
Full-text available
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vo- cabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that rst gener-...
Article
Abstract Interpretation of spatial features often requires combined reasoning over geometry and semantics. We introduce the Space package, an open source SWI-Prolog extension that provides spatial indexing capabilities. Together with the existing semantic web reasoning capabilities of SWI-Prolog, this allows efficient integration of spatial and sem...
Conference Paper
Full-text available
Web Science studies the interpay between web technology and the human behaviour it induces at the micro, meso and macro level. In this extended abstract we examine Web Science research issues by taking a closer look at the area of digital heritage. We discuss engineering, communication and socio-economic aspects.
Article
Full-text available
This paper describes the "food task" of the Ontology Alignment Evaluation Initiative (OAEI) 2006 and 2007. The OAEI** is a comparative evaluation effort to measure the quality of automatic ontology-alignment systems. The food task focuses on the alignment of thesauri in the agricultural domain. It aims at providing a realistic task for ontology-ali...
Article
Full-text available
Traditionally the relations between concepts from a controlled vocabulary, such as the hierarchical and associative relations in a thesaurus, have been used to support users in their search process. In the context of the Semantic Web, multiple interlinked vocabularies are becoming available, providing a large number of different relations between c...
Article
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vocabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that first genera...
Article
This paper presents research in the context of two multidisciplinary projects aimed at facilitating the history domain with an automatic approach for event extraction and modelling. To realise this, the Semantics of History project is providing a historical ontology and a lexicon to support the detection of historical events in textual data whilst...
Article
Full-text available
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can fill the gap. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about Situational Awareness. We show how we abstract over low-level features,...
Conference Paper
Full-text available
Identifying alignments between vocabularies has become a central knowledge engineering activity. A plethora of alignment techniques has been developed over the past years. In this paper we present a case study in which we examine and evaluate the practical use of three typical alignment techniques. The study involves the alignment of two vocabulari...
Conference Paper
Full-text available
Semantic desktop environments aim at improving the effectiveness and efficiency of users carrying out daily tasks within their personal information management (PIM) infrastructure. They support the user by transferring and exploiting the explicit semantics ...
Article
Full-text available
Audiovisual material is a vital component of the world's heritage but it remains difficult to access. With the Netherlands Institute for Sound and Vision as one of its partners, the MuNCH project aims to investigate new methods for improving access to a wide range of audiovisual documents. MuNCH brings together three research fields: multimedia ana...
Article
Full-text available
Web 2.0 — the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability — enables increasing access to digital collections of museums. The expectation is that more and more people will spend time preparing their visit before actually visiting the museum and look for related inf...
Article
Full-text available
In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter-annotator...
Article
Full-text available
The Documentalist Support System (DocSS) is developed to suite novel needs of documentalists working within the Dutch archive for Sound and Vision, broadcasters working outside of Sound and Vision and people interested in the Cultural Heritage value of the archive, who want to perform search in context. The documentalists (and to some extent the ot...
Article
Full-text available
Web 2.0 -the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability enables increasing access to digital collections of museums. The expectation is that more and more, people will spend time preparing their visit before actually visiting the museum and look for related infor...
Article
Full-text available
A method that uses natural language processing techniques and background knowledge in the form of structured vocabularies to automatically identify concepts and their roles from text description is presented. Annotation method using the ARIA collection from Rijksmuseum Amsterdam was evaluated by comparing it to a human-created gold standard and com...
Article
Full-text available
This demo shows the integration of spatial and semantic reasoning for the recognition of ship behavior. We recog-nize abstract behavior such as \ferry trip" and derive that the ship showing this behavior is a \ferry". We accomplish this by abstracting over low-level ship trajectory data and applying Prolog rules that express properties of ship beha...
Article
Full-text available
Metadata vocabularies provide various semantic rela-tions between concepts. For content-based recommender systems, these relations enable a wide range of concepts to be recommended. However, not all semantically re-lated concepts are interesting for end users. In this pa-per, we identified a number of semantic relations, which are both within one v...
Article
The discipline of knowledge engineering grew out of the early work on expert systems in the seventies. With the growing popularity of knowledge-based systems, there arose also a need for a systematic approach for building such systems, similar to methodologies in mainstream software engineering. Over the years, the discipline of knowledge engineeri...
Conference Paper
Full-text available
In many archives of audiovisual documents, retrieval is done using metadata from a structured vocabulary or thesaurus. In practice, many of these thesauri have limited or no structure. The objective of this paper is to find out whether retrieval of audiovisual resources from a collection indexed with an in-house thesaurus can be improved by anchori...
Article
This article presents the CHIP demonstrator 1 for providing personalized access to digital museum col- lections. It consists of three main components: Art Recommender, Tour Wizard, and Mobile Tour Guide. Based on the semantically enriched Rijksmuseum Amsterdam 2 collection, we show how Semantic Web technologies can be deployed to (partially) solve...
Article
In this article we describe a Semantic Web application for semantic annotation and search in large virtual collections of cultural-heritage objects, indexed with multiple vocabularies. During the annotation phase we harvest, enrich and align collection metadata and vocabularies. The semantic-search facilities support keyword-based queries of the gr...
Conference Paper
Full-text available
In cultural heritage, large virtual collections are coming into existence. Such collections contain heterogeneous sets of metadata and vocabulary concepts, originating from multiple sources. In the context of the E-Culture demonstrator we have shown earlier that such virtual collections can be eectively explored with keyword search and semantic clu...
Conference Paper
Full-text available
With the advent of the Web and the efforts towards a Semantic Web the nature of knowledge engineering has changed drastically. The new generation of knowledge systems has left the closed world of isolated applications and feeds on the heterogeneous knowledge sources available online. We propose principles for a new style of knowledge engineering on...
Conference Paper
Full-text available
Evaluation of ontology alignments is in practice done in two ways: (1) assessing individual correspondences and (2) comparing the alignment to a reference alignment. However, this type of evaluation does not guarantee that an application which uses the alignment will perform well. In this paper, we contribute to the current ontology alignment eval-...
Conference Paper
Full-text available
With the advent of the Web and the efforts towards a Semantic Web the nature of knowledge engineering has changed drastically. In this position paper we propose four principles for knowledge engineering on a Web scale. We illustrate these principles with examples from our research in developing a Semantic Web application targeted at cross-collectio...
Conference Paper
Full-text available
Current state-of-the-art ontology-alignment evaluation methods are based on the assumption that alignment relations come in two flavors: correct and incorrect. Some alignment systems find more correct mappings than others and hence, by this assumption, they perform better. In practical applications how- ever, it does not only matter how many correc...
Article
One of the tasks of a maritime safety and security (MSS) system is to map incoming observations in the form of sensor data onto existing maritime domain knowledge. This domain knowledge is modeled in an ontology. The sensor data contains information on ship trajectories, labeled with ship types from this ontology. These ship types are broad and wit...
Conference Paper
Full-text available
As the Semantic Web gains momentum, so grows the interest in making knowledge kept in various reposito- ries available. In this paper we describe a methodolog- ical approach for porting cultural repositories to the Semantic Web, focusing on the global picture of the required mappings and alignments. The approach con- sists of thesaurus conversion,...
Article
This paper reports on a study to explore how semantic relations can be used to expand a query for objects in an image. The study is part of a project with the overall objective to provide semantic annotation and search facilities for a virtual collection of art resources. In this study we used semantic relations from WordNet for 15 image content qu...
Article
Full-text available
In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocab...
Article
Full-text available
The six papers in this special section focus on semantic image and video indexing in broad domains. To bring semantics to the user in broad domains both the indexing and retrieval step have to be considered. The papers here address both steps and the relation to ontologies.
Conference Paper
Full-text available
As the Semantic Web gains momentum, so grows the interest in making knowledge kept in various repositories available. In this paper we describe a case study using a methodologi- cal approach for porting cultural repositories to the Seman- tic Web. The approach consists of thesaurus conversion, meta-data schema mapping, meta-data value mapping, and...
Conference Paper
Full-text available
Part-whole relations are important in many domains, but typically receive less attention than subsumption relation. In this paper we describe a method for finding part-whole relations. The method consists of two steps: (i) finding phrase patterns for both explicit and implicit part-whole relations, and (ii) applying these patterns to find part-whol...
Conference Paper
Full-text available
In this article we report on a user study aimed at evaluating and improving a thesaurus browser. The browser is intended to be used by documentalists of a large public audio-visual archive for finding ap- propriate indexing terms for TV programs. The subjects involved in the study were documentalists of the institutions involved. The study pro- vid...
Conference Paper
Full-text available
Thesauri can be useful resources for indexing and retrieval on the Semantic Web, but often they are not published in RDF/OWL. To convert thesauri to RDF for use in Semantic Web applications and to ensure the quality and utility of the conversion a structured method is required. Moreover, if dierent thesauri are to be interoperable with- out complic...
Article
Full-text available
Multimedia in all forms (images, video, graphics, music, speech) is exploding on the Web. The content needs to be annotated and indexed to enable effective search and retrieval. However, recent standards and best practices for multimedia metadata don't provide semantically rich descriptions of multimedia content. On the other hand, the World Wide W...
Conference Paper
Large amounts of knowledge are available in many knowledge bases for a variety of applications. This knowledge is however usually application specific, and thus not reusable. This paper discusses the problem of making knowledge shareable over applications and reusing it. Three principles are formulated that can form a basis for a methodology for de...
Article
Full-text available
The results of a study are presented, in which people queried a news archive using an interactive video retrieval system. 242 search sessions by 39 participants on 24 topics were assessed. Before, during and after the study, participants filled in questionnaires about their expectations of a search. The questionnaire data, logged user actions on th...
Conference Paper
Full-text available
The main objective of the MultimediaN E-Culture project is to demonstrate how novel semantic-web and presentation technologies can be deployed to provide better indexing and search support within large virtual collections of cultural-heritage resources. The architecture is fully based on open web standards, in particular XML, SVG, RDF/OWL and SPARQ...
Conference Paper
Full-text available
Triple20 is a ontology manipulation and visualization tool for languages built on top of the Semantic-Web RDF triple model. In this article we explain how a triple-centered design compares to the use of a separate proprietary internal data model. We show how to deal with the problems of such a low-level data model and show that it offers advantages...

Network

Cited By