
Roeland Ordelman- PhD
- Project Manager at Universiteit Twente / Netherlands Institute for Sound and Vision
Roeland Ordelman
- PhD
- Project Manager at Universiteit Twente / Netherlands Institute for Sound and Vision
About
114
Publications
16,946
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,984
Citations
Introduction
Current institution
Universiteit Twente / Netherlands Institute for Sound and Vision
Current position
- Project Manager
Additional affiliations
January 2009 - present
Cross-Media Interaction
Position
- Owner
July 2008 - present
Netherlands Institute for Sound and Vision, Hilversum, The Netherlands
Position
- Project Manager
September 1998 - present
Publications
Publications (114)
In this paper we present the currently running PDI-SSH project Homo Medicinalis (HoMed), in which we use machine learning to build an Automatic Speech Recognition (ASR) infrastructure for disclosing privacy-sensitive doctor-patient consultation recordings.
The practices of digital humanists are evolving, highly diversified and experimental. There is also a lack of agreement about whether or not digital humanists should have data and programming skills. Thus, their underlying needs for higher levels of flexibility and transparency may be contradicted by their explicit requests for user-friendly graphi...
Video-to-video linking systems allow users to explore and exploit the content of a large-scale multimedia collection interactively and without the need to formulate specific queries. We present a short introduction to video-to-video linking (also called ‘video hyperlinking’), and describe the latest edition of the Video Hyperlinking (LNK) task at T...
Audiovisual archives are embracing the opportunities offered by digitization for managing their work processes and offering new services to a wide array of user groups. Organization strategy, working processes and software development need to be able to support a culture where innovation can flourish. Some institutions are beginning to adopt the co...
The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that i...
In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that...
This paper overviews ongoing work that aims to support end-users in conveniently exploring and exploiting large audiovisual archives by deploying multiple multimodal linking approaches. We present ongoing work on multimodal video hyperlinking, from a perspective of unconstrained link anchor identification and based on the identification of named en...
The Workshop on Speech, Language and Audio in Multimedia (SLAM) positions itself at at the crossroad of multiple scientific fields (music and audio processing, speech processing, natural language processing and multimedia) to discuss and stimulate research results, projects, datasets and benchmarks initiatives where audio, speech and language are a...
In this paper we report on an evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimize...
textabstractArchives of cultural heritage organisations typically consist of collections in various formats (e.g. photos, video, texts) that are inherently related. Often, such disconnected collections represent value in itself but effectuating links between 'core' and 'context' collection items in various levels of granularity could result in a 'o...
Multimedia hyperlinking is an emerging research topic in the context of digital libraries and (cultural heritage) archives. We have been studying the concept of video-to-video hyperlinking from a video search perspective in the context of the MediaEval evaluation benchmark for several years. Our task considers a use case of exploring large quantiti...
The Workshop on Speech, Language and Audio in Multime-dia (SLAM) positions itself at at the crossroad of multiple scientific fields—music and audio processing, speech processing , natural language processing and multimedia—to discuss and stimulate research results, projects, datasets and benchmarks initiatives where audio, speech and language are a...
Semantic linking has a potential to enrich the audiovisual experience for users of television or radio broadcast archives. Recently, automatic semantic linking, has received increased attention, especially as second screen applications for television broadcasts are emerging. Semantic linking for radio broadcasts can enrich radio listening experienc...
The EU FP7 project AXES aims at better understanding the needs of archive users and supporting them with systems that reach beyond the state-of-the-art. Our system allows users to instantaneously retrieve content using metadata, spoken words, or a vocabulary of reliably detected visual concepts comprising places, objects and events. Additionally, u...
This report describes metrics for the evaluation of the effectiveness of
segment-based retrieval based on existing binary information retrieval metrics.
This metrics are described in the context of a task for the hyperlinking of
video segments. This evaluation approach re-uses existing evaluation measures
from the standard Cranfield evaluation para...
Scholars are yet to make optimal use of Oral History collections. For the
uptake of digital research tools in the daily working practice of researchers,
practices and conventions commonly adhered to in the subfields in the
humanities should be taken into account during development. To this end, in the
Oral History Today project a research tool for...
This paper reports on the results of a quantitative analysis of user requirements for audiovisual search that allow the categorisation of requirements and to compare requirements across user groups. The categorisation provides clear directions with respect to the prioritisation of system features from the perspective of the development of systems f...
Video hyperlinking is regarded as a means to enrich interactive television experiences. Creating links manually however has limitations. In order to be able to automate video hyperlinking and increase its potential we need to have a better understanding of how both broadcasters that supply interactive television and the end-users approach and perce...
Although linking video to additional information sources seems to be a sensible approach to satisfy information needs of user, the perspective of users is not yet analyzed on a fundamental level in real-life scenarios. However, a better understanding of the motivation of users to follow links in video, which anchors users prefer to link from within...
Searching for relevant webpages and following hyperlinks to related content is a widely accepted and effective approach to information seeking on the textual web. Existing work on multimedia information retrieval has focused on search for individual relevant items or on content linking without specific attention to search results. We describe our r...
The negative consequences of cyberbullying are becoming more alarming every day and technical solutions that allow for taking appropriate action by means of automated detection are still very limited. Up until now, studies on cyberbullying detection have focused on individual comments only, disregarding context such as users’ characteristics and pr...
In this paper we report our experiments and results for the brave new searching and hyperlinking tasks for the Medi-aEval Benchmark Initiative 2012. The searching task in-volves finding target video segments based on a short natural language sentence query and the hyperlinking task involves finding links from the target video segments to other rela...
The search and hyperlinking task at MediaEval 2014 is the third edition of this task. As in previous versions, it consisted of two sub-tasks: (i) answering search queries from a collection of roughly 2700 hours of BBC broadcast TV material, and (ii) linking anchor segments from within the videos to other target segments within the video collection....
The MediaEval Multimedia Benchmark leveraged community cooperation and crowdsourcing to develop a large Internet video dataset for its Genre Tagging and Rich Speech Retrieval tasks.
Friendships, relationships and social communications have all gone to a new level with new definitions as a result of the invention of online social networks. Meanwhile, alongside this transition there is increasing evi-dence that online social applications have been used by children and adoles-cents for bullying. State-of-the-art studies in cyberb...
We present an exploratory study of the retrieval of semi- professional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet shar- ing platform blip.tv, and search queries associated with spe- cific speech acts occurring in the video. We compare re-...
As a result of the invention of social networks, friendships, relationships and social communication are all undergoing changes and new definitions seem to be applicable. One may have hundreds of "friends" without even seeing their faces. Meanwhile, alongside this transition there is increasing evidence that online social applications are used by c...
While automatic linking in text collections is well understood, little is known about links in images. In this work, we investigate two aspects of anchors, the origin of a link, in images: 1) the requirements of users for such anchors, e.g. the things users would like more information on, and 2) possible evaluation methods assessing anchor selectio...
The AXES project participated in the interactive known-item search task (KIS) and the interactive instance search task (INS) for TRECVid 2011. We used the same system architecture and a nearly identical user interface for both the KIS and INS tasks. Both systems made use of text search on ASR, visual concept detectors, and visual similarity search....
Safeguarding the massive body of audiovisual content, including rich music collections, in audiovisual archives and enabling access for various types of user groups is a prerequisite for unlocking the social-economic value of these collections. Data quantities and the need for specific content descriptors however, force archives to re-evaluate thei...
This paper presents the audiovisual content access on the Internet. The use of multimedia on the Internet is large and growing at an extraordinary rate. By 2015, one-million minutes of video content will cross the Internet every second. Audiovisual archives are investing in large scale digitization efforts of their analog holdings and, in parallel,...
Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, c...
This paper describes the participation of the University of Twente team at the Rich Text Retrieval Task of the Media Eval Benchmark Initiative 2011. The goal of the task is to find entry points of relevant parts of videos to reduce the browsing effort of searchers. This is our first participation, therefore our main focus is to create a baseline sy...
Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within...
The aim of this paper is to reflect on the factors that impede a clear communication and a more fruitful collaboration between humanities scholars and ICT developers. One of the observations is that ICT-researchers who design tools for humanities researchers, are less inclined to take into account that each stage of the scholarly research process r...
In contrast with the large amounts of potential interesting research material in digital multimedia repositories, the opportunities to unveil the gems therein are still very limited. The Oral History project ‘Verteld Verleden’ (Dutch literal translation of Oral History) that is currently running in The Netherlands, focuses on improving access to sp...
News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news...
Social science is often concerned with the emergence of collective behavior out of the interactions of large numbers of individuals, but in this regard it has long suffered from a severe measurement problem - namely that individual-level behavior and ...
In this technical demonstration, we showcase a multimedia search engine that facilitates semantic access to archival rock n' roll concert video. The key novelty is the crowdsourcing mechanism, which relies on online users to improve, extend, and share, automatically detected results in video fragments using an advanced timeline-based video player....
We carry out two studies on affective state modeling for communication settings that involve unilateral intent on the part of one participant (the evoker) to shift the affective state of another participant (the experiencer). The first investigates viewer response in a narrative setting using a corpus of documentaries annotated with viewer-reported...
Narrative peaks are points at which the viewer perceives a spike in the level of dramatic tension within the narrative flow
of a video. This paper reports on four approaches to narrative peak detection in television documentaries that were developed
by a joint team consisting of members from Delft University of Technology and the University of Twen...
After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken content retrieval, including information retrieval,...
The spoken word is a valuable source of semantic information. Techniques that exploit the spoken word by making use of speech recognition or spoken audio analysis hold clear potential for improving multimedia search. Nonetheless, speech technology remains underexploited by systems that provide access to spoken audio or video with a speech track. In...
Techniques for automatic annotation of spoken content making use of speech recognition technology have long been characterized as holding unrealized promise to provide access to archives inundated with undisclosed multimedia material. This paper provides an overview of techniques and trends in semantic speech retrieval, which is taken to encompass...
Given the enormous backlog at audiovisual archives and the generally global level of item description, collection disclosure and item access are both at risk. At the same time, archival practice is seeking to evolve from the analogue to the digital world. CHoral investigates the role automatic annotation and search technology can play in improving...
StreetTiVo is a project that aims at bringing research results into the living room; in particular, a mix of current results in the
areas of Peer-to-Peer XML Database Management System (P2P XDBMS), advanced multimedia analysis techniques, and advanced information
retrieval techniques. The project develops a plug-in application for the so-called Hom...
Spoken document retrieval research effort invested into developing broadcast news retrieval systems has yielded impressive results. This paper is the introduction the proceedings of the 3rd workshop aiming at the advancement of the field in less explored domains (SSCS2009) which was organized in conjunction to the ACM Multimedia Conference in Beiji...
This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken heritage archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, we at least want to provide search at different levels and a flexible w...
The second workshop on Searching Spontaneous Conversational Speech (SSCS 2008) was held in Singapore on July 24, 2008 in conjunction with the 31st Annual International ACM SIGIR Conference. The goal of the workshop was to bring the speech community and the information retrieval community together. The forum was designed to be conducive to the close...
In addition to multimedia collections and their metadata, there often is a variety of collateral data sources available on (parts of) a collection. Collateral data - secondary information objects that relate to the primary multimedia documents - can be very useful in the process of automated generation of annotations for multimedia archives in that...
In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately, and proposes a fusion technique...
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a b...
The re-use of spoken word audio collections maintained by audiovisual archives is severely hindered by their generally limited access. The CHoral project, which is part of the CATCH program funded by the Dutch Research Council, aims to provide users of speech archives with online, instead of on-location, access to relevant fragments, instead of ful...
The computational linguistics community in The Netherlands and Belgium has long recognized the dire need for a major reference corpus of written Dutch. In part to answer this need, the STEVIN programme was established. To pave the way for the effective building of a 500-million-word reference corpus of written Dutch, a pilot project was established...
Decoders that make use of token-passing restrict their search space by various types of token pruning. With use of the Language Model Look-Ahead (LMLA) technique it is pos- sible to increase the number of tokens that can be pruned with- out loss of decoding precision. Unfortunately, for token passing decoders that use single static pronunciation pr...
The second workshop on Searching Spontaneous Conversational Speech (SSCS 2008) was held in Singapore on July 24, 2008 in conjunction with the 31st Annual International ACM SIGIR Conference. The goal of the workshop was to bring the speech community and the information retrieval community together. The forum was designed to be conducive to the close...
This proceedings volume contains papers on topics that are currently gaining momentum in the speech search community, strengthened by their position in the intersection of information retrieval and speech recognition research. Together these papers cover a wide spectrum of research areas, including vocabulary independent search, spoken term detecti...
This paper reports on the setup and evaluation of robust speech recognition sys- tem parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance fig...
The SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam. The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions.
In this paper the XML Information Retrieval System PF/Tijah is applied to retrieval tasks on large spoken document collections. The used example setting is the English CLEF-2006 CL-SR
collection together with given English topics and self produced Dutch topics. The main findings presented in this paper are
the easy way of adapting queries to use di...
In this paper we discuss the speech activity detection sys- tem that we used for detecting speech regions in the Dutch TRECVID video collection. The system is designed to filter non-speech like music or sound effects out of the signal with- out the use of predefined non-speech models. Because the sys- tem trains its models on-line, it is robust for...
Bridging the semantic gap is one of the big challenges in multimedia information retrieval. It exists between the extraction of low-level features of a video and its conceptual contents. In order to understand the conceptual content of a video a common approach is building concept detectors. A problem of this approach is that the number of detector...
In this report we summarize our methods and results for the search tasks in
TRECVID 2007. We employ two different kinds of search: purely ASR based and
purely concept based search. However, there is not significant difference of the
performance of the two systems. Using neighboring shots for the combination of
two concepts seems to be beneficial. G...
The 'Radio Oranje' demonstrator shows an attractive multimedia user experience in the cultural heritage domain based on a collection of mono-media audio documents. It supports online search and browsing of the collection using indexing techniques, specialized content visualizations and a related photo database.
This contribution describes the Twente News Corpus (TwNC), a multifaceted corpus for Dutch that is being deployed in a number of NLP research projects among which tracks within the Dutch national research programme MultimediaN, the NWO programme CATCH, and the Dutch-Flemish programme STEVIN.
The development of the corpus started in 1998 within a p...
Within the context of international benchmarks and collection specific projects, much work on spoken document retrieval has been done in recent years. In 2000 the issue of automatic speech recognition for spoken document retrieval was declared 'solved' for the broadcast news domain. Many collections, however, are not in this domain and automatic sp...
Access to historical audio collections is typically very restricted:
content is often only available on physical (analog) media and the
metadata is usually limited to keywords, giving access at the level
of relatively large fragments, e.g., an entire tape. Many spoken
word heritage collections are now being digitized, which allows the
introduction...
The Proceedings contain the contributions to the workshop on Searching Spontaneous Conversational Speech organized in conjunction with the 30th ACM SIGIR, Amsterdam 2007.
The papers reflect some of the emerging focus areas and cross-cutting research topics, together addressing evaluation metrics, segmentation methods, workflow aspects, rich transcr...
The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the eectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and au- dio analysis can contribute to increased granularity of automatically ex- tracted metada...
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applicatio...
We discuss the annotation procedure for mental state and emotion that is under development for the AMI (Augmented Multiparty Interaction) corpus. The categories that were found to be most appropriate relate not only to emotions but also to (meta-)cognitive states and interpersonal variables. The history of the development of the annotation scheme i...
This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, t...
This report presents the University of Twente's first cross-language speech retrieval exper- iments in Cross-Language Evaluation Forum (CLEF). It describes the issues our contribution was focusing on, it describes the PF/Tijah XML Information Retrieval system that was used and it discusses the results for both the monolingual English and the Dutch-...
With the 10th anniversary of the death of the Dutch novelist Willem Frederik Hermans (1921-1994), the Willem Frederik Hermans Institute initiated the set-up of a Willem Frederik Hermans portal. Here, all available information related to the Dutch novelist and his work can be consulted. A part of this portal was planned to be dedicated to a collecti...
We present the results of two trials testing procedures for the annotation of emotion and mental state of the AMI corpus. The first procedure is an adaptation of the FeelTrace method, focusing on a continuous labelling of emotion dimensions. The second method is centered around more discrete labeling of segments using categorical labels. The result...
The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present s...
In this paper we describe the 2005 AMI system for the tran- scription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset micr...
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept...
The application of automatic speech recognition in the broadcast news domain is well studied. Recognition perfor-mance is generally high and accordingly, spoken document re-trieval can successfully be applied in this domain, as demon-strated by a number of commercial systems. In other domains, a similar recognition performance is hard to obtain, or...
Whereas the growth of storage capacity is in accordance with widely acknowledged predic- tions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technolo...
On line available http://www.amiproject.org/
In this paper, the current state-of-aairs in Dutch speechbased retrieval as addressed in a series of multimedia retrieval projects is described and possible future directions of the research in this eld are discussed in brief.
In this paper, ongoing work on the development of the speech recognition modules of MMIR environment for Dutch is described. The work on the generation of acoustic models and language models along with their current performance is presented. Some characteristics of the Dutch language and of the target video archives that require special treatment a...
In this paper, ongoing work concerning the language modelling and lexicon optimization of a Dutch speech recognition system for Spoken Document Retrieval is described: the collection and normalization of a training data set and the optimization of our recognition lexicon. Effects on lexical coverage of the amount of training data, of decompounding...
On attaching automatic search functionality to historical video archives