Publications (3)0 Total impact
-
[show abstract]
[hide abstract]
ABSTRACT: We present an ongoing project building a multimodal dialogue sys-tem for a music player supporting natural, flexible interaction and col-laborative behavior. Since the sys-tem functionalities include search-ing a big MP3 database, multimodal output is needed.
-
[show abstract]
[hide abstract]
ABSTRACT: We describe a corpus of multimodal dialogues with an MP3 player collected in Wizard-of-Oz experiments and annotated with a rich feature set at several layers. We are using the Nite XML Toolkit (NXT) (Carletta et al., 2003) to represent and further process the data. We designed an NXT data model, converted experiment log file data and manual transcriptions into NXT, and are building tools for additional annotation using NXT libraries. The annotated corpus will be used to (i) investigate various aspects of multimodal presentation and interaction strategies both within and across annotation layers; (ii) design an initial policy for reinforcement learning of multimodal clarification requests.
-
[show abstract]
[hide abstract]
ABSTRACT: We describe a Wizard-of-Oz experiment setup for the collection of multimodal interaction data for a Music Player application. This setup was devel-oped and used to collect experimental data as part of a project aimed at building a flexible multimodal dialogue system which provides an interface to an MP3 player, combining speech and screen input and output. Besides the usual goal of WOZ data collection to get realistic examples of the behav-ior and expectations of the users, an equally im-portant goal for us was to observe natural behavior of multiple wizards in order to guide our system development. The wizards' responses were there-fore not constrained by a script. One of the chal-lenges we had to address was to allow the wizards to produce varied screen output a in real time. Our setup includes a preliminary screen output planning module, which prepares several versions of possi-ble screen output. The wizards were free to speak, and/or to select a screen output.