
Willem Robert van Hage- PhD
- Technology Lead Data Management & Analytics at Netherlands eScience Center
Willem Robert van Hage
- PhD
- Technology Lead Data Management & Analytics at Netherlands eScience Center
About
74
Publications
20,989
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,709
Citations
Introduction
Senior eScience Research Engineer at the Netherlands eScience Center and guest researcher at the VU University Amsterdam in the field of interdisciplinary eScience and Web Science.
Research topics: augmented sense making, visual analytics, information integration, and semantics.
Principal investigator in the US ONRG funded SAGAN and COMBINE projects and work package leader in the EU FP7 project NewsReader and the Dutch BSIK COMMIT Metis and Data2Semantics projects.
Current institution
Additional affiliations
Education
October 2004 - October 2008
September 1997 - April 2004
Publications
Publications (74)
Poster for presenting the project "VIsual Storytelling of Big Imaging Data".
3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, ri...
3D digital city models form the basis for flow simulations
(e.g. wind flow and water runoff), urban planning, underand
over- ground formation analysis, and they are very important
for automated anomaly detection on man made structures. They
consist of large collections of semantically rich objects which
have many properties such as material and col...
Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance.
Here, we present a complete pipeline for estimating the trustworthiness of artifacts given t...
Fig. 1. The visual matrix is a central element in our visual analytics multiple coordinated view approach for the exploration and analysis of massive mobile phone data. On the left, geographical visualization of Senegal divided into 123 arrondissements that contain a total of 1666 cell towers, shown as white dots (correlating with population densit...
Fig. 1. The visual matrix is a central element in our visual analytics multiple coordinated view approach for the exploration and analysis of massive mobile phone data. On the left, geographical visualization of Senegal divided into 123 arrondissements that contain a total of 1666 cell towers, shown as white dots (correlating with population densit...
Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We...
The usefulness of Software Architecture (SA) documentation depends on how well its Architectural Knowledge (AK) can be retrieved by the stakeholders in a software project. Recent findings show that the use of ontology-based SA documentation is promising. However, different roles in software development have different needs for AK, and building an o...
In this article we investigate the properties of the frequency distribution of numbers on the Web. We work with a part of the Common Crawl dataset comprising 3.8 billion Web documents and a recent dump of the English language Wikipedia. We show that, like words, numbers on the Web follow a Power law distribution, and obey Benford's law of first-dig...
Using an ontology to automatically generate questions for ordinary people requires a structure and concepts compliant with human thought. Here we present methods to develop a pragmatic, expert-based and a basic-level ontology and a framework to evaluate these ontologies. Comparing these ontologies shows that expert-based ontologies are most easy to...
Visual analytics of linked data can be done by secondary school students with minimal preparation. We study the learning curve of students while answering typical Web analytics questions on Wikipedia and DBpedia using SynerScope visual analytics software. We find that after a short tutorial students are able to answer most complex questions in a fe...
In many systems, the determination of trust is reduced to reputation estimation. However, reputation is just one way of determining trust. The estimation of trust can be tackled from a variety of other perspectives. In this chapter, we model trust relying on user reputation, user demographics and from provenance. We then explore the effects of comb...
Interdisciplinary research between computational linguistics and the Semantic Web is increasing. The NLP community makes more and more use of information presented as Linked Data. At the same time, an increasing interest in representing information from text as Linked Data can be observed in the Semantic Web community. It is however not necessarily...
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first- or second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multinomial models, as well as how take into account pos...
Public authorities are increasingly sharing sets of open data. These data are often preprocessed (e.g. smoothened, aggregated) to avoid to expose sensible data, while trying to preserve their reliability. We present two procedures for tackling the lack of methods for measuring the open data reliability. The first procedure is based on a comparison...
In a recent approach, Baader and Distel proposed an algorithm to axiomatize all terminological knowledge that is valid in a given data set and is expressible in the description logic ELK. This approach is based on the mathematical theory of formal concept ...
In this paper we explore the use of semantics to improve diversity in recommendations. We use semantic patterns extracted from Linked Data sources to surface new connections between items to provide diverse recommendations to the end users. We evaluate this methodology by adopting a bottom-up approach, i.e. we ask users of a crowdsourcing platform...
Environmental computer models are considered essential tools in supporting environmental decision making, but their main value is that they allow a better understanding of our complex environment. Despite numerous attempts to promote good modelling practice, transparency of current environmental computer models is limited, which hinders progress in...
This paper introduces GAF, a grounded annotation framework to represent events in a formal context that can represent information from both textual and extra-textual sources. GAF makes a clear distinction between mentions of events in text and their formal
representation as instances in a semantic layer. Instances are represented by RDF compliant U...
Semantic web applications are integrating data from more and more different types of sources about events. However, most data annotation frameworks do not translate well to semantic web. We describe the grounded annotation framework (GAF), a two-layered framework that aims to build a bridge between mentions of events in a data source such as a text...
Trust is a broad concept which, in many systems, is reduced to reputation estimation. However, reputation is just one way of deter-mining trust. The estimation of trust can be tackled from other per-spectives as well, including by looking at provenance. In this work, we look at the combination of reputation and provenance to determine trust values....
This chapter introduces the Simple Event Model and shows how it can be used for modeling events and their related concepts like actors, places, times, and their types. The event modeling discussed in this chapter is motivated from the need to abstract over historical situations to analyze what happened in the past. We show how the Simple Event Mode...
This chapter explores methods for determining the reliability of Auto-mated Identification System (AIS) messages. The primary use of AIS messages in the naval domain is to avoid collisions, therefore they contain kinematic informa-tion about ships. Moreover, AIS messages contain information like the ship name and its identifiers, so AIS messages ca...
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent move-ment is the addition of RDF metadata to make auto-matic processing by computers easier. A fine example of this movement is the Open Government Data initia-...
In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for sev...
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can ll this gap by representing knowledge about the data at dierent level of abstraction. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about...
Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple directories to expressive OWL ontologies) and use different modalities, e.g., b...
Events have become central elements in the representation of data from domains such as history, cultural heritage, multimedia and geography. The Simple Event Model (SEM) is created to model events in these various domains, without making assumptions about the domain-specific vocabularies used. SEM is designed with a minimum of semantic commitment t...
There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the Open Government Data initiativ...
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the Beta-Binomial and the Dirichlet-Multonimal models, as well as how take into account possi...
This paper presents a similarity measure that combines low-level trajectory information with geographical domain knowledge to compare vessel trajectories. The similarity measure is largely based on alignment techniques. In a clustering experiment we show how the measure can be used to discover behavior concepts in vessel trajectory data that are de...
Understanding real world events often calls for the integration of data from multiple often conflicting sources. Trusting the description of an event requires not only determining trust in the data sources but also in the integration process itself. In this work, we propose a trust algorithm for event data based on Subjective Logic that takes into...
In this paper we explore the use of location aware mobile devices for searching and browsing a large number of general and cul- tural heritage information repositories. Based on GPS positioning we can determine a user's location and context, composed of physical nearby lo- cations, historic events that have taken place there, artworks that were cre...
We present an integrated and multidisciplinary approach for analyzing the behavior of moving objects. The results originate from an ongoing research of four different partners from the Dutch Poseidon project (Embedded Systems Institute (200714.
Embedded Systems Institute, 2007. The Poseidon project. [online] http://www.esi.nl/poseidon (http://www.e...
This paper describes a real-time routing system that implements a mobile museum tour guide for providing personalized tours tailored to the user position inside the museum and interests. The core of this tour guide originates from the CHIP (Cultural Heritage Information Personalization) Web-based tools set for personalized access to the Rijksmuseum...
Abstract Interpretation of spatial features often requires combined reasoning over geometry and semantics. We introduce the Space package, an open source SWI-Prolog extension that provides spatial indexing capabilities. Together with the existing semantic web reasoning capabilities of SWI-Prolog, this allows efficient integration of spatial and sem...
The Virtual Laboratory for e-Science (VL-e) project serves as a backdrop for the ideas described in this chapter. VL-e is
a project with academic and industrial partners where e-science has been applied to several domains of scientific research.
Adaptive Information Disclosure (AID), a subprogram within VL-e, is a multi-disciplinary group that conc...
This paper describes the "food task" of the Ontology Alignment Evaluation Initiative (OAEI) 2006 and 2007. The OAEI** is a comparative evaluation effort to measure the quality of automatic ontology-alignment systems. The food task focuses on the alignment of thesauri in the agricultural domain. It aims at providing a realistic task for ontology-ali...
The objective of this paper is to propose a method for esti-mating the trust level of annotations of professional media. We develop a model based on subjective logic and semantic web technology, and subsequently test this on a sample set of annotations from a natural-history museum.
In this paper we outline challenges for user modeling and personalization with spatial information. To illustrate those challenges we use a use case with a real-time routing system that implements a mobile museum guide for providing personalized tours tailored to the user position inside the museum and her interests. In this scenario we combine on...
Background
Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from...
This demo shows the integration of spatial and semantic reasoning for the recognition of ship behavior. We recog-nize abstract behavior such as \ferry trip" and derive that the ship showing this behavior is a \ferry". We accomplish this by abstracting over low-level ship trajectory data and applying Prolog rules that express properties of ship beha...
Bridging the gap between low-level features and semantics is a problem commonly acknowledged in the Multimedia community. Event modeling can fill the gap. In this paper we present the Simple Event Model (SEM) and its application in a Maritime Safety and Security use case about Situational Awareness. We show how we abstract over low-level features,...
Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big effo...
Current state-of-the-art ontology-alignment evaluation methods are based on the assumption that alignment relations come in two flavors: correct and incorrect. Some alignment systems find more correct mappings than others and hence, by this assumption, they perform better. In practical applications how- ever, it does not only matter how many correc...
Uitleg van de verschillende toepassingsmogelijkheden van de AIDA-toolbox. Deze toolbox is gericht op groepen kenniswerkers die gezamenlijk grote verzamelingen documenten over verspreide locaties willen kunnen doorzoeken, annoteren, interpreteren en verrijken. Met AIDA gereedschap kunnen verschillende taken worden uitgevoerd, zoals het leren van nie...
Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test sets. Test sets can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modalities (e.g., blind evaluation, open evaluation, cons...
Ontology matching exists to solve practical problems. Hence, metho- dologies to find and evaluate solutions for ontology matching should be centered on practical problems. In this paper we propose two statistically-founded evalu- ation techniques to assess ontology-matching performance that are based on the application of the alignment. Both are ba...
Methodologies to find and evaluate solutions for ontology matching should be centered on the practical problems to be solved. In this paper we look at matching from the perspective of a practitioner in search of matching techniques or tools. We survey actual matching use cases, and derive general categories from these. We then discuss the value of...
The system we propose to learning seman-tic relations consists of two parallel com-ponents. For our final submission we used components based on the similarity mea-sures defined over WordNet and the patterns extracted from the Web and WMTS. Other components using syntactic structures were explored but not used for the final run.
We present the Ontology Alignment Evaluation Initiative 2006 campaign as well as its results. The OAEI campaign aims at comparing ontology matching systems on precisely defined test sets. OAEI-2006 built over previous campaigns by having 6 tracks followed by 10 participants. It shows clear improvements over previous results. The final and official...
Part-whole relations are important in many domains, but typically receive less attention than subsumption relation. In this
paper we describe a method for finding part-whole relations. The method consists of two steps: (i) finding phrase patterns
for both explicit and implicit part-whole relations, and (ii) applying these patterns to find part-whol...
We discuss four linguistic ontology-mapping techniques and evaluate them on real-life ontologies in the domain of food. Furthermore we propose a method to combine ontology-mapping techniques with high Precision and Recall to reduce the necessary amount of manual labor and computation.
Information retrieval can contribute towards the construction of on- tologies and the effective usage of ontologies. We use collocation-based keyword extraction to suggest new concepts, and study the generation of hyperlinks to automate the population of ontologies with instances. We evaluate our methods within the setting of digital library projec...
We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertex...
We describe the official runs of our team for the CLEF 2004 ad hoc tasks. We took part in the monolingual task (for Finnish, French, Portuguese, and Russian), in the bilingual task (for Amharic to English, and English to Portuguese), and, finally, in the multilingual task. In the CLEF 2004 evaluation exercise we participated in all three ad hoc ret...
We describe the official runs of our team for the CLEF 2004 ad hoc tasks. We took part in the monolingual task (for Finnish, French, Portuguese, and Russian), in the bilingual task (for Amharic to English, and English to Portuguese), and, finally, in the multilingual task.
In this paper we describe a method for finding part-whole relations. The method consists of two steps: (i) finding phrase patterns for both explicit and implicit part-whole relations, and (ii) applying these patterns to find part-whole relation instances. We show results of applying this method to a domain of finding sources of carcinogens.
Spreadsheets are frequently used by scientists to store and analyze research data. To enable integration and reusability of scientific spreadsheet data it is important to explicate the underlying concepts and relations. In this paper we explore to which extent the conceptual model of a research project can be recognized in its spreadsheet implement...