Timo Baumann

Timo Baumann
University of Hamburg | UHH · Department of Informatics

Diplom-Informatiker, PhD

About

79
Publications
12,785
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
798
Citations
Additional affiliations
March 2017 - December 2018
Carnegie Mellon University
Position
  • Researcher
August 2011 - present
University of Hamburg
Position
  • Research Associate
January 2011 - July 2011
Bielefeld University
Position
  • Research Assistant
Education
October 2007 - May 2013
Bielefeld University
Field of study
  • Computational Linguistics (PhD)
October 2001 - May 2007
University of Hamburg
Field of study
  • Informatics, Phonetics (Diplom-Informatiker)

Publications

Publications (79)
Conference Paper
Recent advances in the development of smart homes have led to the availability of a wide variety of devices providing a high level of convenience via gesture and speech control or fully automated operation. Many smart home appliances also address the aspects of safety and electricity savings by automatically powering themselves off after not being...
Preprint
Full-text available
We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel. To the best of our knowledge, this is the first single speaker corpus in the German language consisting of audio, visual and text modalities of comparable size and temp...
Article
Full-text available
The SMOOTH-robot is a mobile robot that—due to its modularity—combines a relatively low price with the possibility to be used for a large variety of tasks in a wide range of domains. In this article, we demonstrate the potential of the SMOOTH-robot through three use cases, two of which were performed in elderly care homes. The robot is designed so...
Conference Paper
Full-text available
Dubbing has two shades; synchronisation constraints are applied only when the actor's mouth is visible on screen, while the translation is unconstrained for off-screen dubbing. Consequently, different synchronisation requirements, and therefore translation strategies, are applied depending on the type of dubbing. In this work, we manually annotate...
Chapter
Speech quality and likability is a multi-faceted phenomenon consisting of a combination of perceptory features that cannot easily be computed nor weighed automatically. Yet, it is often easy to decide which of two voices one likes better, even though it would be hard to describe why, or to name the underlying basic perceptory features. Although lik...
Conference Paper
We analyze the addressee detection task for complexity-identical dialog for both human conversation and device-directed speech. Our recurrent neural model performs at least as good as humans, who have problems with this task, even native speakers, who profit from the relevant linguistic skills. We perform ablation experiments on the features used b...
Conference Paper
Full-text available
This paper describes the tasks, databases, baseline systems, and summarizes submissions and results for the GermEval 2020 Shared Task 1 on the Classification and Regression of Cognitive and Motivational Style from Text. This shared task is divided into two subtasks, a regression task, and a classification task. Subtask 1 asks participants to reprod...
Conference Paper
Full-text available
Conference Paper
Full-text available
A large proportion of (post)-modern poetry contains no or hardly any punctuation. In our contribution, we will investigate how well punctuation information can be recovered for post-modern poetry based on the information contained in the text and speech of free verse poems. We use the world's largest corpus of spoken (post-)modern poetry from our p...
Conference Paper
Full-text available
Data-based analyses are becoming more and more common in the Digital Humanities and tools are needed that focus human efforts on the most interesting and important aspects of exploration, analysis and annotation by using active machine learning techniques. We present our ongoing work on a tool that supports classification tasks for spoken documents...
Conference Paper
Full-text available
This work aims to discern the poetics of concrete poetry by using a corpus- based classification focusing on the two most important techniques used within concrete poetry: semantic decomposition and syntactic permutation. We demonstrate how to identify concrete poetry in modern and postmodern free verse. A class contrasting to concrete poetry is de...
Article
Full-text available
Spoken corpora are important for speech research, but are expensive to create and do not necessarily reflect (read or spontaneous) speech ‘in the wild’. We report on our conversion of the preexisting and freely available Spoken Wikipedia into a speech resource. The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. There are i...
Conference Paper
Full-text available
Ellipses denote the omission of one or more grammatically necessary phrases. In this paper, we will demonstrate how to identify such ellipses as a rhythmical pattern in modern and postmodern free verse poetry by using data from lyrikline which contain the corresponding audio recording of each poem as spoken by the original author. We present a feat...
Conference Paper
Full-text available
We present our research on computer-supported analysis of prosodic styles in post-modern poetry. Our project is unique in making use of both the written as well as the spoken form of the poem as read by the original author. In particular, we use speech and natural language processing technology to align speech and text and to perform textual analys...
Conference Paper
Full-text available
We show how to classify the phrasing of readout poems with the help of machine learning algorithms that use manually engineered features or automatically learnt representations. We investigate modern and postmodern poems from the webpage lyrikline, and focus on two exemplary rhythmical patterns in order to detect the rhythmic phrasing: The Parlando...
Conference Paper
Full-text available
Modern and post-modern free verse poems feature a large and complex variety in their poetic prosodies that falls along a continuum from a more fluent to a more disfluent and choppy style. As the poets of modernism overcame rhyme and meter, they oriented themselves in these two opposing directions, creating a free verse spectrum that calls for new a...
Article
Full-text available
The translation of poetry is a complex, multifaceted challenge: the translated text should communicate the same meaning, similar metaphoric expressions, and also match the style and prosody of the original poem. Research on machine poetry translation is existing since 2010, but for four reasons it is still rather insufficient: 1) The few approaches...
Poster
Full-text available
Detection the similarities between tonality in music and rhythm in poetry/ poetic language
Preprint
Full-text available
The relation of syntax and prosody (the syntax--prosody interface) has been an active area of research, mostly in linguistics and typically studied under controlled conditions. More recently, prosody has also been successfully used in the data-based training of syntax parsers. However, there is a gap between the controlled and detailed study of the...
Conference Paper
Full-text available
Speech-based interactive systems, such as virtual personal assistants, inevitably use complex architectures, with a multitude of modules working in series (or less often in parallel) to perform a task (e.g., giving personalized movie recommendations via dialog). Add modules for evoking and sustaining sociability with the user and the accumulation o...
Conference Paper
Full-text available
Our paper focuses on the computational analysis of "readout poetry" (german: Hördichtung)-recordings of poets reading their own work-with regards to the most important type of this genre, the modern "sound poetry" (german: Lautdichtung). Whereas "readout poetry" often uses normal words and sentences, the "sound poetry", developed by dadaistic poets...
Chapter
Full-text available
Automatic speech recognition (asr) is not only becoming increasingly accurate, but also increasingly adapted for producing timely, incremental output. However, overall accuracy and timeliness alone are insufficient when it comes to interactive dialogue systems which require stability in the output and responsivity to the utterance as it is unfoldin...
Conference Paper
Full-text available
The most important development in modern and postmodern poetry is the replacement of traditional meter by new rhythmical patterns. Ever since Walt Whitmans Leaves of Grass (1855), modern (nineteenth-to twenty-first-century) poets have been searching for novel forms of prosody, accent, rhythm, and intonation. Along with the rejection of older metric...
Conference Paper
Full-text available
The Spoken Wikipedia project unites volunteer readers of encyclopedic entries. Their recordings make encyclopedic knowledge accessible to persons who are unable to read (out of alexia, visual impairment, or because their sight is currently occupied, e. g. while driving). However, on Wikipedia, recordings are available as raw audio files that can on...
Conference Paper
Full-text available
We propose to use a model of personal space to initiate communication while passing a human thereby acknowledging that humans are not just a special kind of obstacle to be avoided but potential interaction partners. As a simple form of interaction, our system communicates an apology while closely passing a human. To this end, we present a software...
Conference Paper
Full-text available
Incremental speech synthesis aims at delivering the synthetic voice while the sentence is still being typed. One of the main challenges is the online estimation of the target prosody from a partial knowledge of the sentence's syntactic structure. In the context of HMM-based speech synthesis, this typically results in missing segmental and suprasegm...
Conference Paper
Full-text available
When a passenger speaks to a driver, he or she is co-located with the driver, is generally aware of the situation, and can stop speaking to allow the driver to focus on the driving task. In-car dialogue systems ignore these important aspects, making them more distract-ing than even cell-phone conversations. We developed and tested a "situationally-...
Conference Paper
Full-text available
It is established that driver distraction is the result of sharing cognitive resources between the primary task (driving) and any other secondary task. In the case of holding conversations, a human passenger who is aware of the driving conditions can choose to interrupt his speech in situations potentially requiring more attention from the driver,...
Conference Paper
Full-text available
Automatic speech recognition (ASR) technology has been developed to such a level that off-the-shelf distributed speech recognition services are available (free of cost), which allow researchers to integrate speech into their applications with little development effort or expert knowledge leading to better results compared with previously used open-...
Conference Paper
Full-text available
Human speakers plan and deliver their utterances incrementally, piece-by-piece, and it is obvious that their choice regarding phonetic details (and the details' peculiarities) is rarely determined by globally optimal solutions. In contrast, parametric speech synthesizers use a full-utterance context when optimizing vocoding parameters and when dete...
Conference Paper
Full-text available
Holding non-co-located conversations while driving is dangerous (Horrey and Wickens, 2006; Strayer et al., 2006), much more so than conversations with physically present, “situated” interlocutors (Drews et al., 2004). In-car dialogue systems typically resemble non-co-located conversations more, and share their negative impact (Strayer et al., 2013)...
Article
Full-text available
When humans speak, they do not plan their full utterance in all detail before beginning to speak, nor do they speak piece-by-piece and ignoring their full message - instead humans use partial representations in which they fill in the missing parts as the utterance unfolds. Incremental speech synthesizers, in contrast, have not yet made use of parti...
Conference Paper
Full-text available
Participants in a conversation are normally receptive to their surroundings and their interlocutors, even while they are speaking and can, if necessary, adapt their ongoing utterance. Typical dialogue systems are not receptive and cannot adapt while uttering. We present combin-able components for incremental natural lan-guage generation and increme...
Conference Paper
Full-text available
Tactile maps are important substitutes for visual maps for blind and visually impaired people and the efficiency of tactile-map reading can largely be improved by giving assisting utterances that make use of spatial language. In this paper, we elaborate earlier ideas for a system that generates such utterances and present a prototype implementation...
Conference Paper
Full-text available
We present a model of semantic processing of spoken language that (a) is robust against ill-formed input, such as can be expected from automatic speech recognisers, (b) respects both syntactic and pragmatic constraints in the computation of most likely interpretations, (c) uses a principled, expressive semantic representation formalism (RMRS) with...
Article
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word proba...
Conference Paper
Full-text available
Conference Paper
Full-text available
We describe the 2012 release of INPROTK1, our ??Incremental Processing Toolkit?? which combines a powerful and extensible architecture for incremental processing with components for incremental speech recognition and, new to this release, incremental speech synthesis. These components work domainindependently; we also provide example implementation...
Conference Paper
Full-text available
We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise uncha...
Conference Paper
Full-text available
Incremental speech synthesis (iSS) accepts input and produces output in consecutive chunks that only together result in a full utterance. Systems that use iSS thus have the ability to adapt their utterances while they are ongoing. However, starting to process with less than the full utterance available prohibits global optimization, leading to pote...
Conference Paper
Full-text available
We present the novel task of predicting temporal features of continuations of user input, while that input is still ongoing. We show that the remaining duration of an ongoing word, as well as the duration of the next can be predicted reasonably well, and we put this information to use in a system that synchronously completes a user's speech. While...
Conference Paper
Full-text available
When dialogue systems, through the use of incremental processing, are not bounded anymore by strict, non-overlapping turn-taking, a whole range of additional interactional devices becomes available. We explore the use of one such device, trial intonation. We elaborate our approach to dialogue management in incremental systems, based on the Informat...
Conference Paper
Full-text available
Incremental natural language understanding is the task of assigning semantic representations to successively larger prefixes of utterances. We compare two types of statistical models for this task: a) local models, which predict a single class for an input; and b), sequential models, which align a sequence of classes to a sequence of input tokens....
Conference Paper
Full-text available
We describe work done at three sites on designing conversational agents capable of incremental processing. We focus on the ‘middleware’ layer in these systems, which takes care of passing around and maintaining incremental information between the modules of such agents. All implementations are based on the abstract model of incremental dialogue pro...
Conference Paper
Full-text available
The potential of using ASR n-best lists for dialogue systems has often been recognised (if less often realised): it is often the case that even when the top-ranked hypothesis is erroneous, a bet- ter one can be found at a lower rank. In this paper, we describe metrics for evaluating whether the same potential carries over to incremental dialogue sy...
Conference Paper
Full-text available
Ideally, a spoken dialogue system should react without much delay to a user's utterance. Such a system would already select an object, for instance, before the user has finished her utterance about moving this particular object to a particular place. A pre- requisite for such a prompt reaction is that semantic representa- tions are built up on the...
Conference Paper
Full-text available
In this paper we do two things: a) we dis- cuss in general terms the task of incre- mental reference resolution (IRR), in par- ticular resolution of exophoric reference, and specify metrics for measuring the per- formance of dialogue system components tackling this task, and b) we present a sim- ple Bayesian filtering model of IRR that performs rea...