• Home
  • Menzo Windhouwer
Menzo Windhouwer

Menzo Windhouwer
KNAW Humanities Cluster · Digital Infrastructure

dr.

About

55
Publications
4,970
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
871
Citations
Additional affiliations
January 2015 - present
Meertens Institute
Position
  • senior scientific software engineer
January 2007 - December 2014
Max Planck Institute for Psycholinguistics
Position
  • Head Knowledge Software

Publications

Publications (55)
Conference Paper
Full-text available
To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the reg...
Article
Full-text available
Since 2014, the Nederlands Dagboekarchief (Dutch Diary Archive) is a part of the Meertens Institute. In the dans -financed project ‘Data from Diaries’ (2016), the paper metadata from the collection of ego documents have been made digitally accessible for researchers in a sustainable way. To this end, the Meertens Institute created a relational data...
Conference Paper
Full-text available
The CLARIN research infrastructure aims to place language resources and services within easy reach of the humanities researchers. One of the measures to make access easy is to allow these researchers to access them using their home institutions credentials. However, the technology used for this makes it hard for services to make delegated call, i.e...
Conference Paper
In recent years, large scale initiatives like CLARIN set out to overcome the notorious heterogeneity of metadata formats in the domain of language resource. The CLARIN Component Metadata Infrastructure established means for flexible resouce descriptions for the domain of language resources. The Data Category Registry ISOcat and the accompanying Rel...
Chapter
This chapter starts with a section on principles underlying data category (DC) specifications, followed by an introduction to ISOcat and its use. It concludes with how the combination of lexical markup framework (LMF) and the ISOcat data category registry (DCR) can be applied. LMF directs its users to the ISOcat DCR to elaborate their application-s...
Chapter
The Rendering Endangered Lexicons Interoperable through Standards Harmonization (RELISH) Interchange Format is created in order to establish interoperability between two digital lexicon formats: LL-LIFT and the lexical markup framework (LMF). One-to-one mapping of elements is implemented wherever possible; however, certain elements and concepts are...
Chapter
The ISOcat Data Category Registry provides a community computing environment for creating, storing, retrieving, harmonizing and standardizing data category specifications (DCs), used to register linguistic terms used in various fields. This chapter recounts the history of DC documentation in TC 37, beginning from paper-based lists created for lexic...
Conference Paper
Full-text available
As the creation of signed language resources is gaining speed world-wide, the need for standards in this field becomes more acute. This paper discusses the state of the field of signed language resources, their metadata descriptions, and annotations that are typically made. It then describes the role that ISOcat may play in this process and how it...
Article
Full-text available
ISO Technical Committee 37, Terminology and other language and con-tent resources, established an ISO 12620:2009 based Data Category Registry (DCR), called ISOcat (see http://www.isocat.org), to foster semantic interoperability of lin-guistic resources. However, this goal can only be met if the data categories are reused by a wide variety of lingui...
Conference Paper
Full-text available
The RELISH project promotes language-oriented research by addressing a two-pronged problem: (1) the lack of harmonization between digital standards for lexical information in Europe and America, and (2) the lack of interoperability among existing lexicons of endangered languages, in particular those created with the Shoebox/Toolbox lexicon building...
Article
In this paper an overview of the knowledge components needed for extensive documentation of small languages is given. The Language Archive is striving to offer all these tools to the linguistic community. The major tools in relation to the knowledge components are described. Followed by a discussion on what is currently lacking and possible strateg...
Article
Full-text available
We describe our computer-supported framework to overcome the rule of metadata schism. It combines the use of controlled vocabularies, managed by a data category registry, with a component-based approach, where the categories can be combined to yield complex metadata structures. A metadata scheme devised in this way will thus be grounded in its use...
Article
Full-text available
The Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, is creating a state-of-the-art web environment for the ISO TC 37 (terminology and other language and content resources) metadata registry. This Data Category Registry (DCR) is called ISOcat and encompasses data categories for a broad range of language resources. Under the...
Conference Paper
Full-text available
In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminologic...
Article
Full-text available
LREC Workshop on the Sustainability of Language Resources and Tools for Natural Language Processing
Chapter
Due to global trends, like the rise of the Internet, the cheapness of storage space and the ease of digital media acquisition, vast collections of digital media, are becoming ubiquitous. Futuristic usage scenarios, like ambient technologies, strive to open up these collections for the consumer market. However, this requires high-level semantic know...
Conference Paper
Full-text available
The Internet forms today’s largest source of information, with public services like libraries and museums digitizing their collections and making (parts of) it available to the public. Likewise, the public digitizes private information, e.g., holiday pictures and movies, and shares it on the World Wide Web (WWW). This kind of document collections h...
Article
In this chapter the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provi...
Chapter
In this chapter the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provi...
Article
Full-text available
In this report the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provid...
Article
In this article we argue that the automatic generation of dynamic multimedia presentation requires both low-level collections of objective measurements for media units representing prototypical style elements, and high-level conceptual descriptions supporting contextual and presentational requirements. Only the combination of both facilitates the r...
Article
Full-text available
In this article we argue that the automatic generation of dynamic multimedia presentation requires both low-level collections of objective measurements for media units representing prototypical style elements, and high-level conceptual descriptions supporting contextual and presentational requirements. Only the combination of both facilitates the r...
Conference Paper
Full-text available
Most of the multimedia objects distributed over the Internet are still too poorly meta-indexed to be of any use in retrieval tasks. In general these dynamic multimedia objects are manually annotated. The high costs involved in manually indexing multimedia objects, which grow ever more in volume, type and complexity, call for automatic multimedia ca...
Conference Paper
Full-text available
Presents a digital library search engine that combines efforts of the AMIS and DMW research projects, each covering significant parts of the problem of finding the required information in an enormous mass of data. The most important contributions of our work are the following: (1) We demonstrate a flexible solution for the extraction and querying o...
Chapter
In this chapter the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provi...
Article
Full-text available
Introduction The everlasting search for new methods to explore the Inter- or Intranet is still going on. In this demo we present the combined e#ort of the AMIS and DMW research projects, each covering significant parts of this problem. The contribution of this demo is twofold. Firstly, we demonstrate how feature grammars o#er a flexible solution fo...
Article
In this article we argue that for the automatic generation of adaptive multimedia presentations we are in need of expandable, adaptable style descriptions which provide both high-level conceptual and low-level feature extraction information. Only the combination of both facilitates the retrieval of adequate material and its user-centred presentatio...
Conference Paper
Full-text available
In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential...
Conference Paper
Due to the ubiquity and popularity of XML, users often are in the following situation: they want to query XML documents which contain potentially interesting information but they are unaware of the mark-up structure that is used. For example, it is easy to guess the contents of an XML bibliography file whereas the mark-up depends on the methodologi...
Article
Full-text available
With the increasing popularity of the WWW, the main challenge in computer science has become content-based retrieval of mul- timedia objects. Access to multimedia objects in databases has long been limited to the information provided in manually assigned keywords. Now, with the integration of feature-detection algorithms in database systems softwar...
Article
In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential...
Article
We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured document model. We show how our framework, called feature grammars, can be used to (1) exploit third-party interpretation modules for real-world unstructured components, and (2) use context-free grammars to convert such poorly...
Article
Full-text available
The Acoi project provides a large-scale experimentation platform to facilitate studies in the area of indexing multimedia objects and their subsequent retrieval. The index model is based on assembling the results of feature detection algorithms into hierarchical structures to classify the objects. This paper provides an overview of the Acoi archite...
Article
Full-text available
We propose a grammatical view of the problem of integrating different data items under a database perspective. We introduce a variant of context-free grammars, called feature grammars , whose parsers may rewrite their input stream. This allows us to provide a simple mechanism for describing and maintaining indexes to Internet multimedia documents....
Article
Full-text available
The Acoi project provides a large-scale experimentation platform to facilitate studies in the area of indexing multimedia objects and their subsequent retrieval. The index model is based on assembling the results of feature detection algorithms into hierarchical structures to classify the objects. This paper provides an overview of the Acoi archite...
Article
In this paper, we present a data and an execution model that allow for ecient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, exible and semantically homogeneous units we are able to exploit the performance potential of v...
Article
Full-text available
Introduction The explosion of the number of Web pages also leads to countless accessible multimedia objects. Their abundance makes the Internet an interesting application for multimedia retrieval systems. Many search engines are going about to supply some retrieval functionality for independent retrieval of these objects. However, most of these mul...
Article
Full-text available
Due to the ubiquity and popularity of XML, users often are in the following situation: they want to query XML documents which contain potentially interesting information but they are unaware of the mark-up structure that is used. For example, it is easy to guess the contents of an XML bibliography file whereas the mark-up depends on the methodologi...
Article
Full-text available
The TC 37 Data Category Registry (DCR; www.isocat.org) specifies names, authoritative definitions, and other information and constraints for data categories used in a wide range of linguistic resources. Data category selections subsetted and exported from the DCR in the Data Category Interchange Format can be used as the basis for configuring diver...
Article
Full-text available
The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to ov...
Article
Full-text available
Most documents researched in the human and social sciences will be enriched one way or another, at least with metadata. Sometimes documents are also enriched with one or more types of annotation. Often the notions used can be interpreted in several ways , which raises the question: "What is meant in a particular case?" ISOcat is a ISO 12620:2009 co...
Article
Project workshop: Framework for the Interoperability of Dutch Language Resources (FIDLR-Start)

Projects

Projects (7)