
Guus Schreiber- PhD
- Head of Department at Vrije Universiteit Amsterdam
Guus Schreiber
- PhD
- Head of Department at Vrije Universiteit Amsterdam
About
246
Publications
60,874
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,092
Citations
Current institution
Publications
Publications (246)
With the increase of cultural heritage data published online, the usefulness of data in this open context hinges on the quality and diversity of descriptions of collection objects. In many cases, existing descriptions are not sufficient for retrieval and research tasks, resulting in the need for more specific annotations. However, eliciting such an...
An increasing number of cultural heritage institutions publish data online. Ontologies can be used to structure published data, thereby increasing interoperability. To achieve widespread adoption of ontologies, institutions such as libraries, archives and museums have to be able to assess whether an ontology can adequately capture information about...
This paper describes BiographyNet, a digital humanities project (2012-2016) that brings together researchers from history, computational linguistics and computer science. The project uses data from the Biography Portal of the Netherlands (BPN), which contains approximately 125,000 biographies from a variety of Dutch biographical dictionaries from t...
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
In the digital era, personalisation systems are the typical way to deal with the massive amount of information on the Web. ese systems decide in our place what we like, possibly hiding us away from a complete world of potentially interesting content. ese systems do not challenge us to open our horizons of interest, trap- ping us more and more in ou...
In this paper we propose several approaches for automatic annotation of natural science spreadsheets using a combination of structural properties of the tables and external vocabularies. During the design process of their spreadsheets, domain scientists implicitly include their domain model in the content and structure of the spreadsheet tables. Ho...
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible...
It is possible to automatically annotate a natural science spreadsheet using lexical matching, given that the tables in these spreadsheets meet a number of requirements regarding the content. Results of a survey show that most of the existing natural science spreadsheets deviate from the ideal situation. We propose to complement lexical matching wi...
Content Management Systems haven’t gained much from the Linked Data uptake, and sharing content between different websites and systems is hard. On the other side, using Linked Data in web documents is not as trivial as managing regular web content using a CMS. To address these issues, we present a method for creating human readable web documents ou...
Spreadsheets models are frequently used by scientists to analyze research data. These models are typically described in a paper or a report, which serves as single source of information on the underlying research project. As the calculation workflow in these models is not made explicit, readers are not able to fully understand how the research resu...
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first- or second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multinomial models, as well as how take into account pos...
In this study we consider wether, and to what extent, additional semantics in the form of Linked Data can help diversifying search results. We undertake this study in the domain of cultural heritage. The data consists of collection data of the Rijksmuseum Amsterdam together with a number of relevant external vocabularies, which are all published as...
Linking historical datasets and making them available on the Web has increasingly become a subject of research in the field of digital humanities. In this paper, we focus on discovering links between ships from a dataset of Dutch maritime events and a historical archive of newspaper articles. We apply a heuristic-based method for finding and filter...
In this paper we present an experiment which has been performed to validate a pragmatic-based, expert-based and basic-level ontology. These ontologies were created for use in an application which generates questions for ordinary people with the purpose to determine a crisis situation. All three ontologies have specific characteristics related to th...
Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We...
Using an ontology to automatically generate questions for ordinary people requires a structure and concepts compliant with human thought. Here we present methods to develop a pragmatic, expert-based and a basic-level ontology and a framework to evaluate these ontologies. Comparing these ontologies shows that expert-based ontologies are most easy to...
Public authorities are increasingly sharing sets of open data. These data are often preprocessed (e.g. smoothened, aggregated) to avoid to expose sensible data, while trying to preserve their reliability. We present two procedures for tackling the lack of methods for measuring the open data reliability. The first procedure is based on a comparison...
In a recent approach, Baader and Distel proposed an algorithm to axiomatize all terminological knowledge that is valid in a given data set and is expressible in the description logic ELK. This approach is based on the mathematical theory of formal concept ...
In this paper, we present an evaluation framework for online access to cultural heritage. The framework enables the assessment of online cultural heritage applications in terms of their provision and support of information and interpretation. It is anchored in digital hermeneutics: the study and theory of the Web as a vehicle of (self)-interpretati...
Simple Knowledge Organization System (SKOS) provides a data model and
vocabulary for expressing Knowledge Organization Systems (KOSs) such as
thesauri and classi?cation schemes in Semantic Web applications. This paper
presents the main components of SKOS and their formal expression in Web
Ontology Language (OWL), providing an extensive account of t...
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. This trend is driven by the assumption that user tags will improve video search. In this paper we study whether this is indeed the case. To this end, we create an evaluation dataset that consists of: (i) a set of vide...
Knowledge-acquisition research started in the eighties as a small research community focusing on knowledge-intensive problems in relatively small domains. In this paper we look at the influence the Web has had on knowledge acquisition and vice versa. To this end we discuss in some depth four topics, namely the ontology language OWL, the vocabulary...
Diversity and profundity of the topics in cultural heritage collections make experts from outside the institution indispensable for acquiring qualitative and comprehensive annotations. We define the con- cept of nichesourcing and present challenges in the process of obtain- ing qualitative annotations from people in these niches. We believe that ex...
Environmental computer models are considered essential tools in supporting environmental decision making, but their main value is that they allow a better understanding of our complex environment. Despite numerous attempts to promote good modelling practice, transparency of current environmental computer models is limited, which hinders progress in...
In this document we describe the Amsterdam Museum Linked Open Data set. The dataset is a five-star Linked Data representation and comprises the entire collection of the Amsterdam Museum consisting of more than 70,000 object descriptions. Furthermore, the institution's thesaurus and person authority files used in the object metadata are included in...
The BiographyNet project aims at inspiring historians when setting up new research projects. The goal is to create a semantic knowledge base by extracting links between people, historic events, places and time periods from a variety of Dutch biographical dictionaries. A demonstrator will be developed providing visualization and browsing techniques...
In this position paper we identify nichesourcing, a specific form of human-based computation that harnesses the computational efforts from niche groups rather than the "faceless crowd". We claim that nichesourcing combine the strengths of the crowd with those of professionals, optimizing the result of human-based computation for certain tasks. We i...
This chapter explores methods for determining the reliability of Auto-mated Identification System (AIS) messages. The primary use of AIS messages in the naval domain is to avoid collisions, therefore they contain kinematic informa-tion about ships. Moreover, AIS messages contain information like the ship name and its identifiers, so AIS messages ca...
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperabil...
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can ll this gap by representing knowledge about the data at dierent level of abstraction. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about...
In this article we discuss an application scenario for semantic annotation and search in a collection of art images. This application shows that background knowledge in the form of ontologies can be used to support indexing and search in image collections. The underlying ontologies are represented in RDF Schema and are based on existing data standa...
Events have become central elements in the representation of data from domains such as history, cultural heritage, multimedia and geography. The Simple Event Model (SEM) is created to model events in these various domains, without making assumptions about the domain-specific vocabularies used. SEM is designed with a minimum of semantic commitment t...
We present a Linked Data analysis method which relies on knowledge patterns for constructing a logical architecture of the knowledge in a dataset. This can then be exploited to compare heterogeneous datasets, enhance interoperability between them and make implicit knowledge emerge.
Gold standard mappings created by experts are at the core of alignment evaluation. At the same time, the process of manual evaluation is rarely discussed. While the practice of having multiple raters evaluate results is accepted, their level of agreement is often not measured. In this paper we describe three experiments in manual evaluation and stu...
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the Open Government Data initiativ...
Within cultural heritage collections, objects are often grounded in a particular historical setting. This setting can currently not be made explicit, as structured descriptions of events are either missing or not marked up explicitly. This paper reports a study on automatic extraction of an historical event thesaurus from unstructured texts. We sho...
In recent years, crowdsourcing has gained attention as an alternative method for collecting video annotations. An example is the internet video labeling game Waisda? launched by the Netherlands Institute for Sound and Vision. The goal of this PhD research is to investigate the value of the user tags collected with this video labeling game. To this...
Cultural heritage institutions are currently rethinking ac-cess to their collections to allow the public to interpret and contribute to their collections. In this work, we present the Agora project, an interdisciplinary project in which Web technology and theory of interpretation meet. This we call digital hermeneutics. The Agora project facilitate...
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multonimal models, as well as how take into account possi...
Most digitised and online available objects from GLAMs (Galleries, Libraries, Archives, Museums) can be browsed through a predefined set of formal metadata, such as its creator, year of creation, and type of material. Standards for metadata management and exchange have matured and are being adopted widely. They enable intra-collection search and ex...
Recent research has shown the Linked Data cloud to be a potentially ideal basis for improving user experience when interacting with Web content across different applications and domains. Using the explicit knowledge of datasets, however, is neither sufficient nor straight-forward. Dataset knowledge is often not uniformly organized, thus it is gener...
In this paper, we define reusable inference steps for content-based recommender systems based on semantically-enriched collections.
We show an instantiation in the case of recommending artworks and concepts based on a museum domain ontology and a user profile
consisting of rated artworks and rated concepts. The recommendation task is split into fo...
It is common practice in audiovisual archives to disclose documents using metadata from a structured vocabulary or thesaurus. Many of these thesauri have limited or no structure. The objective of this paper is to find out whether retrieval of audiovisual resources from a collection indexed with an in-house thesaurus can be improved by enriching the...
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vo- cabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that rst gener-...
Abstract Interpretation of spatial features often requires combined reasoning over geometry and semantics. We introduce the Space package, an open source SWI-Prolog extension that provides spatial indexing capabilities. Together with the existing semantic web reasoning capabilities of SWI-Prolog, this allows efficient integration of spatial and sem...
Web Science studies the interpay between web technology and the human behaviour it induces at the micro, meso and macro level. In this extended abstract we examine Web Science research issues by taking a closer look at the area of digital heritage. We discuss engineering, communication and socio-economic aspects.
This paper describes the "food task" of the Ontology Alignment Evaluation Initiative (OAEI) 2006 and 2007. The OAEI** is a comparative evaluation effort to measure the quality of automatic ontology-alignment systems. The food task focuses on the alignment of thesauri in the agricultural domain. It aims at providing a realistic task for ontology-ali...
Traditionally the relations between concepts from a controlled vocabulary, such as the hierarchical and associative relations in a thesaurus, have been used to support users in their search process. In the context of the Semantic Web, multiple interlinked vocabularies are becoming available, providing a large number of different relations between c...
In this paper we build on our methodology for combining and selecting alignment techniques for vocabularies, with two alignment case studies of large vocabularies in two languages. Firstly, we analyze the vocabularies and based on that analysis choose our alignment techniques. Secondly, we test our hypothesis based on earlier work that first genera...
This paper presents research in the context of two multidisciplinary projects aimed at facilitating the history domain with an automatic approach for event extraction and modelling. To realise this, the Semantics of History project is providing a historical ontology and a lexicon to support the detection of historical events in textual data whilst...
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can fill the gap. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about Situational Awareness. We show how we abstract over low-level features,...
Identifying alignments between vocabularies has become a central knowledge engineering activity. A plethora of alignment techniques has been developed over the past years. In this paper we present a case study in which we examine and evaluate the practical use of three typical alignment techniques. The study involves the alignment of two vocabulari...
Semantic desktop environments aim at improving the effectiveness and efficiency of users carrying out daily tasks within their personal information management (PIM) infrastructure. They support the user by transferring and exploiting the explicit semantics ...
Audiovisual material is a vital component of the world's heritage but it remains difficult to access. With the Netherlands Institute for Sound and Vision as one of its partners, the MuNCH project aims to investigate new methods for improving access to a wide range of audiovisual documents. MuNCH brings together three research fields: multimedia ana...
Web 2.0 — the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability — enables increasing access to digital collections of museums. The expectation is that more and more people will spend time preparing their visit before actually visiting the museum and look for related inf...
In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter-annotator...
The Documentalist Support System (DocSS) is developed to suite novel needs of documentalists working within the Dutch archive for Sound and Vision, broadcasters working outside of Sound and Vision and people interested in the Cultural Heritage value of the archive, who want to perform search in context. The documentalists (and to some extent the ot...
Web 2.0 -the perceived second generation of the World Wide Web that aims to improve collaboration, sharing of information and interoperability enables increasing access to digital collections of museums. The expectation is that more and more, people will spend time preparing their visit before actually visiting the museum and look for related infor...
A method that uses natural language processing techniques and background knowledge in the form of structured vocabularies to automatically identify concepts and their roles from text description is presented. Annotation method using the ARIA collection from Rijksmuseum Amsterdam was evaluated by comparing it to a human-created gold standard and com...
This demo shows the integration of spatial and semantic reasoning for the recognition of ship behavior. We recog-nize abstract behavior such as \ferry trip" and derive that the ship showing this behavior is a \ferry". We accomplish this by abstracting over low-level ship trajectory data and applying Prolog rules that express properties of ship beha...
Metadata vocabularies provide various semantic rela-tions between concepts. For content-based recommender systems, these relations enable a wide range of concepts to be recommended. However, not all semantically re-lated concepts are interesting for end users. In this pa-per, we identified a number of semantic relations, which are both within one v...
The discipline of knowledge engineering grew out of the early work on expert systems in the seventies. With the growing popularity of knowledge-based systems, there arose also a need for a systematic approach for building such systems, similar to methodologies in mainstream software engineering. Over the years, the discipline of knowledge engineeri...
In many archives of audiovisual documents, retrieval is done using metadata from a structured vocabulary or thesaurus. In
practice, many of these thesauri have limited or no structure. The objective of this paper is to find out whether retrieval
of audiovisual resources from a collection indexed with an in-house thesaurus can be improved by anchori...
This article presents the CHIP demonstrator 1 for providing personalized access to digital museum col- lections. It consists of three main components: Art Recommender, Tour Wizard, and Mobile Tour Guide. Based on the semantically enriched Rijksmuseum Amsterdam 2 collection, we show how Semantic Web technologies can be deployed to (partially) solve...
In this article we describe a Semantic Web application for semantic annotation and search in large virtual collections of cultural-heritage objects, indexed with multiple vocabularies. During the annotation phase we harvest, enrich and align collection metadata and vocabularies. The semantic-search facilities support keyword-based queries of the gr...
In cultural heritage, large virtual collections are coming into existence. Such collections contain heterogeneous sets of metadata and vocabulary concepts, originating from multiple sources. In the context of the E-Culture demonstrator we have shown earlier that such virtual collections can be eectively explored with keyword search and semantic clu...
With the advent of the Web and the efforts towards a Semantic Web the nature of knowledge engineering has changed drastically. The new generation of knowledge systems has left the closed world of isolated applications and feeds on the heterogeneous knowledge sources available online. We propose principles for a new style of knowledge engineering on...
Evaluation of ontology alignments is in practice done in two ways: (1) assessing individual correspondences and (2) comparing the alignment to a reference alignment. However, this type of evaluation does not guarantee that an application which uses the alignment will perform well. In this paper, we contribute to the current ontology alignment eval-...
With the advent of the Web and the efforts towards a Semantic Web the nature of knowledge engineering has changed drastically. In this position paper we propose four principles for knowledge engineering on a Web scale. We illustrate these principles with examples from our research in developing a Semantic Web application targeted at cross-collectio...
Current state-of-the-art ontology-alignment evaluation methods are based on the assumption that alignment relations come in two flavors: correct and incorrect. Some alignment systems find more correct mappings than others and hence, by this assumption, they perform better. In practical applications how- ever, it does not only matter how many correc...
One of the tasks of a maritime safety and security (MSS) system is to map incoming observations in the form of sensor data onto existing maritime domain knowledge. This domain knowledge is modeled in an ontology. The sensor data contains information on ship trajectories, labeled with ship types from this ontology. These ship types are broad and wit...
As the Semantic Web gains momentum, so grows the interest in making knowledge kept in various reposito- ries available. In this paper we describe a methodolog- ical approach for porting cultural repositories to the Semantic Web, focusing on the global picture of the required mappings and alignments. The approach con- sists of thesaurus conversion,...
This paper reports on a study to explore how semantic relations can be used to expand a query for objects in an image. The study is part of a project with the overall objective to provide semantic annotation and search facilities for a virtual collection of art resources. In this study we used semantic relations from WordNet for 15 image content qu...
In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocab...
The six papers in this special section focus on semantic image and video indexing in broad domains. To bring semantics to the user in broad domains both the indexing and retrieval step have to be considered. The papers here address both steps and the relation to ontologies.
As the Semantic Web gains momentum, so grows the interest in making knowledge kept in various repositories available. In this paper we describe a case study using a methodologi- cal approach for porting cultural repositories to the Seman- tic Web. The approach consists of thesaurus conversion, meta-data schema mapping, meta-data value mapping, and...
Part-whole relations are important in many domains, but typically receive less attention than subsumption relation. In this
paper we describe a method for finding part-whole relations. The method consists of two steps: (i) finding phrase patterns
for both explicit and implicit part-whole relations, and (ii) applying these patterns to find part-whol...
In this article we report on a user study aimed at evaluating and improving a thesaurus browser. The browser is intended to be used by documentalists of a large public audio-visual archive for finding ap- propriate indexing terms for TV programs. The subjects involved in the study were documentalists of the institutions involved. The study pro- vid...
Thesauri can be useful resources for indexing and retrieval on the Semantic Web, but often they are not published in RDF/OWL. To convert thesauri to RDF for use in Semantic Web applications and to ensure the quality and utility of the conversion a structured method is required. Moreover, if dierent thesauri are to be interoperable with- out complic...
Multimedia in all forms (images, video, graphics, music, speech) is exploding on the Web. The content needs to be annotated and indexed to enable effective search and retrieval. However, recent standards and best practices for multimedia metadata don't provide semantically rich descriptions of multimedia content. On the other hand, the World Wide W...
Large amounts of knowledge are available in many knowledge bases for a variety of applications. This knowledge is however usually application specific, and thus not reusable. This paper discusses the problem of making knowledge shareable over applications and reusing it. Three principles are formulated that can form a basis for a methodology for de...
The results of a study are presented, in which people queried a news archive using an interactive video retrieval system. 242 search sessions by 39 participants on 24 topics were assessed. Before, during and after the study, participants filled in questionnaires about their expectations of a search. The questionnaire data, logged user actions on th...
The main objective of the MultimediaN E-Culture project is to demonstrate how novel semantic-web and presentation technologies can be deployed to provide better indexing and search support within large virtual collections of cultural-heritage resources. The architecture is fully based on open web standards, in particular XML, SVG, RDF/OWL and SPARQ...
Triple20 is a ontology manipulation and visualization tool for languages built on top of the Semantic-Web RDF triple model. In this article we explain how a triple-centered design compares to the use of a separate proprietary internal data model. We show how to deal with the problems of such a low-level data model and show that it offers advantages...