Article

Runtime Model Based Framework for Automatic Evaluation of Multimodal Interfaces

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Multimodal interfaces are expected to improve input and output capabilities of increasingly sophisticated applications. Several approaches are aimed at formally describing multimodal interaction. However, they rarely treat it as a continuous flow of actions, preserving its dynamic nature and considering modalities at the same level. This work proposes a model-based approach called Practice-oriented Analysis and Description of Multimodal Interaction (PALADIN) aimed at describing sequential multimodal interaction beyond such problems. It arranges a set of parameters to quantify multimodal interaction as a whole, in order to minimise the existing differences between modalities. Furthermore, interaction is described stepwise to preserve the dynamic nature of the dialogue process. PALADIN defines a common notation to describe interaction in different multimodal contexts, providing a framework to assess and compare the usability of systems. Our approach was integrated into four real applications to conduct two experiments with users. The experiments show the validity and prove the effectiveness of the proposed model for analysing and evaluating multimodal interaction.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... (S3) PALADIN [138,139] is a metamodel aimed at describing multimodal interaction in a uniform and dynamic way. Its design arranges a set of parameters to quantify multimodal interaction generalizing the existing dierences between modalities. ...
... In order to ease its integration into research and production systems, a helping framework was developed. It is called Instantiation Framework (IF), its implementation 129 is open-source and can be downloaded from [139] as well. The IF is aimed at serving as a bridge between the interaction source (e.g., a lter extracting live interaction from an application, an application simulating user-system interaction, an interaction log) and the PALADIN instances. ...
... The current implementation of PALADIN and the IF can be easily integrated in the source code of a Java application. In [139] it is carefully described how PALADIN can be used with or without the IF, and how these tools are integrated into an application to implement multimodal interaction analysis. The IF provides a facade 3 from which it is easily notied by an external tool instrumenting interaction. ...
Thesis
Full-text available
1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Human-Computer Interaction . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Data Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.4 Software Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.5 Quality of Experience . . . . . . . . . . . . . . . . . . . . . . . . . 10 Enhancing Software Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.1 Block 1: Achieving Quality in Interaction Components Separately . 12 1.2.2 Block 2: Achieving Quality of User-System Interaction as a Whole . 14 1.3 Goals of this PhD Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4 Publications Related to this PhD Thesis . . . . . . . . . . . . . . . . . . . . 19 1.5 Software Contributions of this PhD Thesis . . . . . . . . . . . . . . . . . . 22 1.5.1 OHT: Open HMI Tester . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5.2 S-DAVER: Script-based Data Verification . . . . . . . . . . . . . . 24 1.5.3 PALADIN: Practice-oriented Analysis and Description of Multi-modal Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 CARIM: Context-Aware and Ratings Interaction Metamodel . . . . 25 1.6 Summary of Research Goals, Publications, and Software Contributions . . 25 1.7 Context of this PhD Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.8 Structure of this PhD Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Related Work 2.1 Group 1: Approaches Assuring Quality of a Particular Interaction Component 30 2.2 Validation of Software Output . . . . . . . . . . . . . . . . . . . . 30 2.1.1.1 Methods Using a Complete Model of the GUI . . . . . . 31 2.1.1.2 Methods Using a Partial Model of the GUI . . . . . . . . 32 2.1.1.3 Methods Based on GUI Interaction . . . . . . . . . . . . 32 Validation of User Input . . . . . . . . . . . . . . . . . . . . . . . . 33 2.1.2.1 Data Verification Using Formal Logic . . . . . . . . . . . 34 2.1.2.2 Data Verification Using Formal Property Monitors . . . . 35 2.1.2.3 Data Verification in GUIs and in the Web . . . . . . . . . 36 Group 2: Approaches Describing and Analyzing User-System Interaction as a Whole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.1 Analysis of User-System Interaction . . . . . . . . . . . . . . . . . 37 2.2.1.1 Analysis for the Development of Multimodal Systems . . 37 2.2.1.2 Evaluation of Multimodal Interaction . . . . . . . . . . . 41 2.2.1.3 Evaluation of User Experiences . . . . . . . . . . . . . . 44 Analysis of Subjective Data of Users . . . . . . . . . . . . . . . . . 45 2.2.2.1 User Ratings Collection . . . . . . . . . . . . . . . . . . 45 2.2.2.2 Users Mood and Attitude Measurement . . . . . . . . . . 47 Analysis of Interaction Context . . . . . . . . . . . . . . . . . . . . 49 2.2.3.1 Interaction Context Factors Analysis . . . . . . . . . . . 49 2.2.3.2 Interaction Context Modeling . . . . . . . . . . . . . . . 50 3 Evaluating Quality of System Output 3.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 GUI Testing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 Preliminary Considerations for the Design of a GUI Testing Architecture . 57 3.3.1 Architecture Actors . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.2 Organization of the Test Cases . . . . . . . . . . . . . . . . . . . . 57 3.3.3 Interaction and Control Events . . . . . . . . . . . . . . . . . . . . 58 The OHT Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.1 The HMI Tester Module Architecture . . . . . . . . . . . . . . . . 60 3.4.2 The Preload Module Architecture . . . . . . . . . . . . . . . . . . . 61 3.4.3 The Event Capture Process . . . . . . . . . . . . . . . . . . . . . . 63 3.4.4 The Event Playback Process . . . . . . . . . . . . . . . . . . . . . . 64 The OHT Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.1 Implementation of Generic and Final Functionality . . . . . . . . . 66 3.5.1.1 Generic Data Model . . . . . . . . . . . . . . . . . . . . 66 3.5.1.2 Generic Recording and Playback Processes . . . . . . . . 66 Implementation of Specific and Adaptable Functionality . . . . . . 67 3.5.2.1 Using the DataModelAdapter . . . . . . . . . . . . . . . 68 3.5.2.2 The Preloading Process . . . . . . . . . . . . . . . . . . . 68 3.5.2.3 Adapting the GUI Event Recording and Playback Processes 69 3.7 Technical Details About the OHT Implementation . . . . . . . . . 70 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.6.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.6.2 The Test Case Generation Process . . . . . . . . . . . . . . . . . . 73 3.6.3 Validation of Software Response . . . . . . . . . . . . . . . . . . . 74 3.6.4 Tolerance to Modifications, Robustness, and Scalability . . . . . . . 75 3.6.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 76 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4 Evaluating Quality of Users Input 4.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Practical Analysis of Common GUI Data Verification Approaches . . . . . 82 4.3 Monitoring GUI Data at Runtime . . . . . . . . . . . . . . . . . . . . . . . 83 4.4 Verification Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.1 Rule Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.2 Using the Rules to Apply Correction . . . . . . . . . . . . . . . . . 87 4.4.3 Rule Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4.4 Rule Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4.4.1 88 Loading the Rules . . . . . . . . . . . . . . . . . . . . . xviiContents 4.4.4.2 Evolution of the Rules and the GUI . . . . . . . . . . . . 89 Correctness and Consistency of the Rules . . . . . . . . . . . . . . 90 4.5 The Verification Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.6 S-DAVER Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6.1 Architecture Details . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6.2 Architecture Adaptation . . . . . . . . . . . . . . . . . . . . . . . . 94 4.7 S-DAVER Implementation and Integration Considerations . . . . . . . . . 95 4.8 Practical Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.8.1 Integration, Configuration, and Deployment of S-DAVER . . . . . 99 4.8.2 Defining the Rules in Qt Bitcoin Trader . . . . . . . . . . . . . . . 100 4.8.3 Defining the Rules in Transmission . . . . . . . . . . . . . . . . . . 103 4.8.4 Development and Verification Experience with S-DAVER . . . . . 106 4.9 Performance Analysis of S-DAVER . . . . . . . . . . . . . . . . . . . . . . 106 4.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.10.1 A Lightweight Data Verification Approach . . . . . . . . . . . . . 108 4.10.2 The S-DAVER Open-Source Implementation . . . . . . . . . . . . . 110 4.10.3 S-DAVER Compared with Other Verification Approaches . . . . . . 111 4.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5 Modeling and Evaluating Quality of Multimodal User-System Interaction 115 5.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2 A Model-based Framework to Evaluate Multimodal Interaction . . . . . . . 118 5.2.1 Classification of Dialog Models by Level of Abstraction . . . . . . 119 5.2.2 The Dialog Structure . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2.3 Using Parameters to Describe Multimodal Interaction . . . . . . . 121 5.2.3.1 Adaptation of Base Parameters . . . . . . . . . . . . . . 121 5.2.3.2 Defining new Modality and Meta-communication Param- eters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.2.3.3 Defining new Parameters for GUI and Gesture Interaction 123 5.2.3.4 Classification of the Multimodal Interaction Parameters . 124 5.3 Design of PALADIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4 Implementation, Integration, and Usage of PALADIN . . . . . . . . . . . . 129 5.5 Application Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 Assessment of PALADIN as an Evaluation Tool . . . . . . . . . . . 132 5.5.1.1 Participants and Material . . . . . . . . . . . . . . . . . 134 5.5.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.5.1.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 137 Usage of PALADIN in a User Study . . . . . . . . . . . . . . . . . 140 5.5.2.1 Participants and Material . . . . . . . . . . . . . . . . . 140 5.5.2.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.6.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.6.2 Practical Application of PALADIN . . . . . . . . . . . . . . . . . . 147 5.6.3 Completeness of PALADIN According to Evaluation Guidelines . . 148 5.6.4 Limitations in Automatic Logging of Interactions Parameters . . . 151 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.8 Parameters Used in PALADIN . . . . . . . . . . . . . . . . . . . . . . . . . 152 6 Modeling and Evaluating Mobile Quality of Experience 163 6.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.2 Context- and QoE-aware Interaction Analysis . . . . . . . . . . . . . . . . 166 6.2.1 Incorporating Context Information and User Ratings into Interaction Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2.2 Arranging the Parameters for the Analysis of Mobile Experiences . 168 6.2.3 Using CARIM for QoE Assessment . . . . . . . . . . . . . . . . . . 169 Context Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.3.1 Quantifying the Surrounding Context . . . . . . . . . . . . . . . . 170 6.3.2 Arranging Context Parameters into CARIM . . . . . . . . . . . . . 173 User Perceived Quality Parameters . . . . . . . . . . . . . . . . . . . . . . 173 6.4.1 Measuring the Attractiveness of Interaction . . . . . . . . . . . . . 173 6.4.2 Measuring Users Emotional State and Attitude toward Technology Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.5 Arranging User Parameters into CARIM . . . . . . . . . . . . . . . 177 CARIM Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.5.1 The Base Design: PALADIN . . . . . . . . . . . . . . . . . . . . . 177 6.5.2 The New Proposed Design: CARIM . . . . . . . . . . . . . . . . . 178 6.6 CARIM Model Implementation . . . . . . . . . . . . . . . . . . . . . . . . 181 6.7 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.7.1 Participants and Material . . . . . . . . . . . . . . . . . . . . . . . 183 6.7.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 6.9 Comparing the Two Interaction Designs for UMU Lander 185 Validating the User Behavior Hypotheses . . . . . . . . . 186 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.8.1 Modeling Mobile Interaction and QoE . . . . . . . . . . . . . . . . 188 6.8.3 CARIM Implementation and Experimental Validation . . . . . . . 190 CARIM Compared with Other Representative Approaches . . . . . 191 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7 Conclusions and Further Work 7.2 Conclusions of this PhD Thesis . . . . . . . . . . . . . . . . . . . . . . . . 196 7.1.2 Driving Forces of this PhD Thesis . . . . . . . . . . . . . . . . . . 196 Work and Research in User-System Interaction Assessment . . . . 197 7.1.3 Goals Achieved in this PhD Thesis . . . . . . . . . . . . . . . . . . 200 Future Lines of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Bibliography 205 A List of Acronyms 231
... It structures all these data to be the basis for the implementation of QoE analysis and inference processes. We propose to base this meta-model design on an existing one [8], [9] which is being developed in parallel to this work, in a joint effort between the Cátedra SAES [10] and the Telekom Innovation Labora- tories [11] to quantify interaction in multimodal contexts. The base meta-model describes interaction by turn, i.e., each time the user or the system take part in the dialog, following a dialog structure, i.e., a set of ordered systemand user-turns. ...
... The process is completely automatic, as data in Model A instances are used by ATL to fill data fields in the Model B instance according to the rules. Some validations tests were conducted in the context of the PALADIN project [8], [9]. The participants used multimodal input (speech + touch) to book a restaurant within an Android smartphone. ...
... El atributo de naturalidad está presente en el cuestionario MOS-X y SUISQ-R, haciendo referencia sobre la entonación y ritmo; indagando si la voz se parece a la de un humano, si el sistema sonaba como una persona normal y no una voz sintética robotizada. Los demás cuestionarios no cuentan con ítems que diferencien entre la interacción basado en comando o en conversación, por ejemplo, AttrakDiff ha sido utilizado para evaluar sistemas de diálogo basado en conversación [30] así como en sistemas que utilizan interacción con voz basado en comandos [35]. Teixeira et al. [36] utilizan el ICF-US test para evaluar un sistema enfocado a adultos mayores que utiliza una interfaz de voz por medio de comandos. ...
Article
Full-text available
Voice user interfaces (VUI) have been increasingly used in everyday settings and they are growing in popularity. These interfaces have predominantly eyes-free and hands-free interactions. This kind of experiences continues to be an inceptive field compared to other input methods such as touch or using the keyboard/mouse. Thus, it is important to identify tools used to evaluate the usability of VUIs. This article presents a systematic review, in which we analyzed 57 articles and describes nine questionnaires used for evaluating the usability of VUIs, assessing the potential suitability of these questionnaires to measure different types of interactions and various usability dimensions. We found that these questionnaires were used to evaluate the usability of voice-only and voice-added VUIs: AttrakDiff, ICF-US, MOS-X, SUISQ-R, SUS, SASSI, UEQ, PARADISE and USE, where the SUS questionnaire is the most commonly used. However, its items do not directly assess voice quality, although it evaluates the general user interaction with a system. All the questionnaires include items related to three usability dimensions (effectiveness, efficiency, and satisfaction). The questionnaire with the most homogeneous coverage regarding the number of items in each aspect of usability is the SASSI questionnaire. It is a normal practice to use multiple questionnaires to obtain a more complete measurement of usability. We perceive the necessity to increase usability research about the differences between the voice interaction with diverse display types (voice-first, voice-only, voice-added) and the dialog types (command-based and conversational), and how usability affects the user expectations about the VUIs.
... By having access to them, the evaluator is able to analyze current data and have a better grasp on the evaluation current status making small changes to it, if required. The major difference of DynEaaS, when compared to other evaluation frameworks, such as those proposed by Navarro et al. [22], Ickin et al. [17] and Witt [45] is that it specifically addresses the context of use and emphasizes the need to collect the data at the best possible time, or at least contextualizing it as best as possible. For example, it makes far more sense to ask a user about an application feature right after he has used (or had problems with) it than to do the same questions at the end of the evaluation session, when most of the impressions have probably faded; or it might not be a good time to enrol the user in providing feedback if he/she is leaving for an appointment. ...
Conference Paper
Multimodal user interfaces provide users with different ways of interacting with applications. This has advantages both in providing interaction solutions with additional robustness in environments where a single modality might result in ambiguous input or output (e.g., speech in noisy environments), and for users with some kind of limitation (e.g., hearing difficulties resulting from ageing) by yielding alternative and more natural ways of interacting. The design and development of applications supporting multimodal interaction involves numerous challenges, particularly if the goals include the development of multimodal applications for a wide variety of scenarios, designing complex interaction and, at the same time, proposing and evolving interaction modalities. These require the choice of an architecture, development and evaluation methodologies and the adoption of principles that foster constant improvements at the interaction modalities level without disrupting existing applications. Based on previous and ongoing work, by our team, we presentour approach to the design, development and evaluation of multimodal applications covering several devices and application scenarios.
... El diseño que proponemos en este trabajo está basado en PALADIN [16,17,18], un modelo destinado a cuantificar y describir de forma dinámica (i.e., paso a paso) el proceso de interacción en entornos multimodales. ...
Article
Full-text available
Este artículo describe un nuevo enfoque para modelar la calidad de la experiencia de los usuarios (QoE) en entornos móviles. El modelo presentado tiene el nombre de CARIM, e intenta dar respuesta a las siguientes preguntas: ¿cómo se puede medir la QoE en entornos móviles a partir del análisis de la interacción usuario-sistema? ¿cómo se pueden comparar y contrastar diferentes medidas de QoE? Para ello, CARIM utiliza un conjunto de parámetros con los que describe, paso a paso, la interacción entre el usuario y el sistema, el contexto en el cual se produce esta interacción, y el nivel de calidad percibido por los usuarios. Estos parámetros se estructuran dentro de un modelo, lo que proporciona (1) una representación común de cómo transcurre el proceso de interacción en diferentes entornos móviles y (2) una base para calcular la QoE automáticamente así como para comprar diferentes registros de interacción. CARIM es un modelo en tiempo real que permite el análisis dinámico de la interacción, así como la toma de decisiones basadas en un cierto nivel de QoE en tiempo de ejecución. Esto es utilizado por ciertas aplicaciones durante la ejecución para adaptarse y así proporcionar una mejor experiencia a los usuarios. A modo de conclusión, CARIM proporciona un criterio unificado con el cual calcular, analizar y comparar la QoE en sistemas móviles de distinta naturaleza.
Article
GUI testing is essential to provide validity and quality of system response, but applying it to a development is not straightforward: it is time consuming, requires specialized personnel, and involves complex activities that sometimes are implemented manually. GUI testing tools help supporting these processes. However, integrating them into software projects may be troublesome, mainly due to the diversity of GUI platforms and operating systems in use. This work presents the design and implementation of Open HMI Tester (OHT), an application framework for the automation of testing processes based on GUI introspection. It is cross-platform, and provides an adaptable design aimed at supporting major event-based GUI platforms. It can also be integrated into ongoing and legacy developments using dynamic library preloading. OHT provides a robust and extensible basis to implement GUI testing tools. A capture and replay approach has been implemented as proof of concept. Introspection is used to capture essential GUI and interaction data. It is used also to simulate real human interaction in order to increase robustness and tolerance to changes between testing iterations. OHT is being actively developed by the Open-source Community and, as shown in this paper, it is ready to be used in current software projects.
Conference Paper
Full-text available
Designing interactive computer systems to be efficient and easy to use is important so that people in our society may realize the potential benefits of computer-based tools .... Although modern cognitive psychology contains a wealth of knowledge of human ...
Article
Full-text available
In the MATIS project a multimodal system has been developed for train timetable information. The aim of the project was to obtain guidelines for designing multimodal i nterfaces for information systems. The MATIS system accepts input both in spoken and in graphical mode (no keyboard input) and provides feedback in the same two modes. The user can choose at any time which of the input modalities (s)he prefers to use for a certain action. A user test was carried out in which 25 subjects were asked to evaluate the system. For comparison, users were a lso asked to test a GUI (Graphical User Interface) version of the train timetable information system as well as a speech-only version o f the system. We measured the e fficiency and the e ffectiveness of the interaction and the user satisfaction with all three systems.
Article
Full-text available
This essay is a personal reflection from an Artificial Intelligence (AI) perspective on the term HCI. Especially for the transfer of AI-based HCI into industrial environments, we survey existing approaches and examine how AI helps to solve fundamental problems of HCI technology. The user and the system must have a collaborative goal. The concept of collaborative multimodality could serve as the missing link between traditional HCI and intuitive human-centred designs in the form of, e.g., natural language interfaces or intelligent environments. Examples are provided in the medical imaging domain.
Conference Paper
Full-text available
We demonstrate how the Tycoon framework can be put to practice with the Anvil tool in a concrete case study. Tycoon offers a coding scheme and analysis metrics for multimodal communication scenarios. Anvil is a generic, extensible and ergonomically designed annotation tool for videos. In this paper, we describe the Anvil tool, the Tycoon scheme/metrics, and their implementation in Anvil for a video sample. A new Anvil feature, motivated by the Tycoon scheme, is presented: non-temporal annotation objects – an important concept, we argue, of general interest. We also outline future plans for automatizing Tycoon metrics computation using Anvil plug-ins.
Article
Full-text available
The lack of suitable training and testing data is currently a major roadblock in applying machine-learning techniques to dialogue man-agement. Stochastic modelling of real users has been suggested as a solution to this problem, but to date few of the proposed models have been quantitatively evaluated on real data. In-deed, there are no established criteria for such an evaluation. This paper presents a systematic approach to testing user simulations and as-sesses the most prominent domain-independent techniques using a large DARPA Communica-tor corpus of human-computer dialogues. We show that while recent advances have led to significant improvements in simulation quality, simple statistical metrics are still sufficient to discern synthetic from real dialogues.
Article
Full-text available
The MITRE Corporation's Evaluation Working Group has developed a methodology for evaluating multi-modal groupware systems and capturing data on human-human interactions. The methodology consists of a framework for describing collaborative systems, a scenario-based evaluation approach, and evaluation metrics for the various components of collaborative systems. We designed and ran two sets of experiments to validate the methodology by evaluating collaborative systems. In one experiment, we compared two configurations of a multi-modal collaborative application using a map navigation scenario requiring information sharing and decision making. In the second experiment, we applied the evaluation methodology to a loosely integrated set of collaborative tools, again using a scenario-based approach. In both experiments, multi-modal, multi-user data were collected, visualized, annotated, and analyzed.
Article
Full-text available
A key advantage of taking a statistical approach to spoken dialogue systems is the ability to formalise dialogue policy design as a stochastic optimization problem. However, since dialogue policies are learnt by interactively exploring alternative dialogue paths, conventional static dialogue corpora cannot be used directly for training and instead, a user simulator is commonly used. This paper describes a novel statistical user model based on a compact stack-like state representation called a user agenda which allows state transitions to be modeled as sequences of push- and pop-operations and elegantly encodes the dialogue history from a user's point of view. An expectation-maximisation based algorithm is presented which models the observable user output in terms of a sequence of hidden states and thereby allows the model to be trained on a corpus of minimally annotated data. Experimental results with a real-world dialogue system demonstrate that the trained user model can be successfully used to optimise a dialogue policy which outperforms a hand-crafted baseline in terms of task completion rates and user satisfaction scores.
Conference Paper
Full-text available
What are the most suitable interaction paradigms for navigational and informative tasks for pedestrians? Is there an influence of social and situational context on multimodal interaction? Our study takes a closer look at a multimodal system on a handheld device that was recently developed as a prototype for mobile navigation assistance. The system allows visitors of a city to navigate, to get information on sights, and to use and manipulate map information. In an outdoor evaluation, we studied the usability of such a system on site. The study yields insight about how multimodality can enhance the usability of hand-held devices with their future services. We show, for example that for our more complicated tasks multimodal interaction is superior to classical unimodal interaction.
Conference Paper
Full-text available
In this paper we present our work toward the creation of a multimodal expressive Embodied Conversational Agent (ECA). Our agent, called Greta, exhibits nonverbal behaviors synchronized with speech. We are using the taxonomy of communicative functions developed by Isabella Poggi [22] to specify the behavior of the agent. Based on this taxonomy a representation language, Affective Presentation Markup Language, APML has been defined to drive the animation of the agent [4]. Lately, we have been working on creating no longer a generic agent but an agent with individual characteristics. We have been concentrated on the behavior specification for an individual agent. In particular we have defined a set of parameters to change the expressivity of the agent's behaviors. Six parameters have been defined and implemented to encode gesture and face expressivity. We have performed perceptual studies of our expressivity model.
Conference Paper
Full-text available
Multimodal interaction enables the user to employ different modalities such as voice, gesture and typing for communicating with a computer. This paper presents an analysis of the integration of multiple communication modalities within an interactive system. To do so, a software engineering perspective is adopted. First, the notion of “multimodal system” is clarified. We aim at proving that two main features of a multimodal system are the concurrency of processing and the fusion of input/output data. On the basis of these two features, we then propose a design space and a method for classifying multimodal systems. In the last section, we present a software architecture model of multimodal systems which supports these two salient properties: concurrency of processing and data fusion. Two multimodal systems developed in our team, VoicePaint and NoteBook, are used to illustrate the discussion.
Conference Paper
Full-text available
ABSTRACT The development ,and the evaluation of multimodal ,interactive systems on mobile phones remains a difficult task. In this paper weaddress,this problem ,by describing a component-based approach, called ACICARE, for developing and evaluating multimodal,interfaces ,on mobile ,phones. ,ACICARE is dedicated,to the ,overall iterative design ,process of mobile multimodal interfaces, which consists of cycles of designing, prototyping,and ,evaluation. ACICARE is based ,on two complementary,tools that are combined: ICARE and ACIDU. ICARE is a component-based ,platform for rapidly developing multimodal,interfaces. We adapted ,the ICARE components ,to run on mobile phones and we connected them to ACIDU, a probe that gathers customer’s usage ,on mobile ,phones. By reusing and assembling components, ACICARE enables the rapid development ,of multimodal ,interfaces as well ,as the automatic capture of multimodal,usage for in-field evaluations. Weillustrate ACICARE using our contact manager system, a multimodal,system running on the SPV c500 mobile phone. Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User
Conference Paper
Full-text available
Designing and implementing multimodal applications that take advantage of several recognition- based interaction techniques (e.g. speech and gesture recognition) is a difficult task. The goal of our research is to explore how simple modelling techniques and tools can be used to support the designers and developers of multimodal systems. In this paper, we discuss the use of finite state machines (FSMs) for the design and prototyping of multimodal commands. In particular, we show that FSMs can help designers in reasoning about synchronization patterns problems. Finally, we describe an implementation of our FSM-based approach, in a toolkit whose aim is to facilitate the iterative process of designing, prototyping and testing multimodality.
Conference Paper
Full-text available
Representing the behaviour of multimodal interactive systems in a complete, concise and non- ambiguous way is still a challenge for formal description techniques. Indeed, multimodal interactive systems embed specific constraints that are either cumbersome or impossible to capture with classical formal description techniques. This is due to both the multiple facets of a multimodal system and the strong temporal constraints usually encountered in this kind of systems. This paper presents a formal description technique dedicated to the engineering of interactive multimodal systems. Its aim is to provide a precise way for describing, analyzing and reasoning about multi-modal interactive systems prior to their implementation. One of the basic components for multi-modal systems is the fusion mechanisms. This paper focuses on this component and, in order to exemplify the approach, the formal description technique is used for the modelling and the analysis of one fusion mechanism. Lastly, benefits and limitations of the approach are discussed.
Conference Paper
Full-text available
We propose the CARE properties as a simple way of characterisin g and assessing aspects of multimodal interaction: the Complementarity, Assignment, Redundancy, and Equivalence that may occur between the interaction techniques available in a multimodal user interface. We provide a formal definition of these properties and use the notion of compatibility to show how the system CARE properties interact with user CARE-like properties in the design of a system. The discussion is illustrated with MATIS, a Multimodal Air Travel Information System.
Conference Paper
Full-text available
In this paper, the efficiency and usage patterns of input modes in multimodal dialogue systems is investigated for both desktop and personal digital assistant (PDA) working environments. For this purpose a form-filling travel reservation application is evaluated that combines the speech and visual modalities; three multimodal modes of interaction are implemented, namely: "Click-To-Talk", "Open-Mike" and "Modality-Selection". The three multimodal systems are evaluated and compared with the "GUI-Only" and "Speech-Only" unimodal systems. Mode and duration statistics are computed for each system, for each turn and for each attribute in the form. Turn time is decomposed in interaction and inactivity time and the statistics for each input modeare computed. Results show that multimodal and adaptive interfaces are superior in terms of interaction time, but not always in terms of inactivity time. Also users tend to use themost efficient input mode, although our experiments show abias towards the speech modality.
Conference Paper
Full-text available
In this paper we report on our experience on the design and evaluation of multimodal user interfaces in various contexts. We introduce a novel combination of existing design and evaluation methods in the form of a 5-step iterative process and show the feasibility of this method and some of the lessons learned through the design of a messaging application for two contexts (in car, walking). The iterative design process we employed included the following five basic steps: 1) identification of the limitations affecting the usage of different modalities in various contexts (contextual observations and context analysis) 2) identifying and selecting suitable interaction concepts and creating a general design for the multimodal application (storyboarding, use cases, interaction concepts, task breakdown, application UI and interaction design), 3) creating modality-specific UI designs, 4) rapid prototyping and 5) evaluating the prototype in naturalistic situations to find key issues to be taken into account in the next iteration. We have not only found clear indications that context affects users' preferences in the usage of modalities and interaction strategies but also identified some of these. For instance, while speech interaction was preferred in the car environment users did not consider it useful when they were walking. 2D (finger strokes) and especially 3D (tilt) gestures were preferred by walking users.
Conference Paper
Full-text available
In this paper, we propose two new objective metrics, relative modality efficiency and multimodal synergy, that can provide valuable information and identify usability problems during the evaluation of multimodal systems. Relative modality efficiency (when compared with modality usage) can identify suboptimal use of modalities due to poor interface design or information asymmetries. Multimodal synergy measures the added value from efficiently combining multiple input modalities, and can be used as a single measure of the quality of modality fusion and fission in a multimodal system. The proposed metrics are used to evaluate two multimodal systems that combine pen/speech and mouse/keyboard modalities respectively. The results provide much insight into multimodal interface usability issues, and demonstrate how multimodal systems should adapt to maximize modalities synergy resulting in efficient, natural, and intelligent multimodal interfaces.
Conference Paper
Full-text available
While multimodal interfaces are becoming more and more used and supported, their development is still difficult and there is a lack of authoring tools for this purpose. The goal of this work is to discuss how multimodality can be specified in model-based languages and apply such solution to the composition of graphical and vocal interactions. In particular, we show how to provide structured support that aims to identify the most suitable solutions for modelling multimodality at various detail levels. This is obtained using, amongst other techniques, the well-known CARE properties in the context of a model-based language able to support service-based applications and modern Web 2.0 interactions. The method is supported by an authoring environment, which provides some specific solutions that can be modified by the designers to better suit their specific needs, and is able to generate implementations of multimodal interfaces in Web environments. An example of modelling a multimodal application and the corresponding, automatically generated, user interfaces is reported as well.
Conference Paper
Full-text available
The past few years, multimodal interaction has been gaining importance in virtual environments. Although multimodality makes interaction with the environment more intuitive and natural, the development cycle of such an environment is often a long and expensive process. In our overall field of research, we investigate how model-based design can help shorten this process by designing the application with the use of high-level diagrams. In this scope, we present 'NiMMiT', a graphical notation suitable for expressing multimodal user interaction. We elaborate on the NiMMiT primitives and illustrate the notation in practice with a comprehensive example.
Conference Paper
Full-text available
USer Interface eXtensible Markup Language (USIXML) consists of a User Interface Description Language (UIDL) allowing designers to apply a multi-path development of user interfaces. In this development paradigm, a user interface can be specified and produced at and from different, and possibly multiple, levels of abstraction while maintaining the mappings between these levels if required. Thus, the development process can be initiated from any level of abstraction and proceed towards obtaining one or many final user inter- faces for various contexts of use at other levels of abstraction. In this way, the model-to-model transformation which is the cornerstone of Model-Driven Ar- chitecture (MDA) can be supported in multiple configurations, based on com- position of three basic transformation types: abstraction, reification, and trans- lation.
Conference Paper
Full-text available
This paper defines the problem space of distributed, migratable and plastic user interfaces, and presents CAMELEON-RT1, a technical answer to the problem. CAMELEON-RT1 is an architecture reference model that can be used for comparing and reasoning about existing tools as well as for developing future run time infrastructures for distributed, migratable, and plastic user inter- faces. We have developed an early implementation of a run time infrastructure based on the precepts of CAMELEON-RT1.
Article
Full-text available
The use of tangible multimodal (TMM) systems, which let safety-critical systems users continue to employ the physical objects, language and symbology of their workplace are described. The TMM users are capable of updating digital systems and of collaborating digitally with colleagues as are users of more traditional systems. TMM has the portability, high resolution, scalability and physical properties of pen and paper and can meet the needs of officers in the field, in particular, robustness to computer failure. The TMM enable users to employ physical objects in their workplace along with natural spoken language, sketch, gesture and other input modalities to interact with information and with co-workers.
Article
Full-text available
One important evolution in software applications is the spread of service-oriented architectures in ubiquitous environments. Such environments are characterized by a wide set of interactive devices, with interactive applications that exploit a number of functionalities developed beforehand and encapsulated in Web services. In this article, we discuss how a novel model-based UIDL can provide useful support both at design and runtime for these types of applications. Web service annotations can also be exploited for providing hints for user interface development at design time. At runtime the language is exploited to support dynamic generation of user interfaces adapted to the different devices at hand during the user interface migration process, which is particularly important in ubiquitous environments.
Article
This paper describes a novel approach to model the quality of experience (QoE) of users in mobile environments. The Context-Aware and Ratings Interaction Model (CARIM) addresses the open questions of how to quantify user experiences from the analysis of interaction in mobile scenarios, and how to compare different QoE records to each other. A set of parameters are used to dynamically describe the interaction between the user and the system, the context in which it is performed and the perceived quality of users. CARIM structures these parameters into a uniform representation, supporting the dynamic analysis of interaction to determine QoE of users and enabling the comparison between different interaction records. Its run-time nature allows applications to make context- and QoE-based decisions in real-time to adapt themselves, and thus provide a better experience to users. As a result, CARIM provides unified criteria for the inference and analysis of QoE in mobile scenarios. Its design and implementation can be integrated (and easily extended if needed) into many different development environments. An experiment with real users comparing two different interaction designs and validating user behavior hypotheses proved the effectiveness of applying CARIM for the assessment of QoE in mobile applications.
Article
Quality of Service (QoS) and Quality of Experience (QoE) have to be considered when designing, building and maintaining services involving multimodal human–machine interaction. In order to guide the assessment and evaluation of such services, we first develop a taxonomy of the most relevant QoS and QoE aspects which result from multimodal human–machine interactions. It consists of three layers: (1) The quality factors influencing QoS and QoE related to the user, the system, and the context of use; (2) the QoS interaction performance aspects describing user and system behavior and performance; and (3) the QoE aspects related to the quality perception and judgment processes taking place within the user. For each of these layers, we then provide metrics which are able to capture the QoS and QoE aspects in a quantitative way, either via questionnaires or performance measures. The metrics are meant to guide system evaluation and make it more systematic and comparable.
Conference Paper
This paper describes a general framework for evaluating and comparing the perform ance of multimodal dialogue systems: PROMISE (Procedure for Multimodal Interactive System Evaluation). PROMISE is a possible extention to multimodality of the PARADISE framework ((1, 2) used for the evaluation of spoken dialogue systems), where we aimed to solve the problems of scoring multimodal inputs and outputs, weighting the different recognition modalities and of how t o deal with not directed (non-directed) task definitions and the resulting, potentially uncompleted tasks by the users. PROMISE is used in the end-to-end-evaluation of the SmartKom project - in which an intelligent computer-user interface that deals with various kinds of oral and physic al input is being developed. The aim of SmartKom is to allow a natural form of communication within man-machine interaction.
Article
In this paper we present a novel approach for prototyping, testing and evaluating multimodal interfaces, OpenWizard. OpenWizard allows the designer and the developer to rapidly evaluate a non-fully functional multimodal prototype by replacing one modality or a composition of modalities that are not yet available by wizard of oz techniques. OpenWizard is based on a conceptual component-based approach for rapidly developing multimodal interfaces, an approach first implemented in the ICARE software tool and more recently in the OpenInterface tool. We present a set of wizard of oz components that are implemented in OpenInterface. While some wizard of oz (WoZ) components are generic to be reused for different multimodal applications, our approach allows the integration of tailored WoZ components. We illustrate OpenWizard using a multimodal map navigator. KeywordsMultimodality-Wizard of oz-Component-based approach
Article
With the technical advances and market growth in the field, the issues of evaluation and usability of spoken language dialogue systems, unimodal as well as multimodal, are as crucial as ever. This paper discusses those issues by reviewing a series of European and US projects which have produced major results on evaluation and usability. Whereas significant progress has been made on unimodal spoken language dialogue systems evaluation and usability, the emergence of, among others, multimodal, mobile, and domain-oriented systems continues to pose entirely new challenges to research in evaluation and usability.
Article
In the context of Model Driven Engineering, models are the main development artifacts and model transformations are among the most important operations applied to models. A number of specialized languages have been proposed, aimed at specifying model transformations. Apart from the software engineering properties of transformation languages, the availability of high quality tool support is also of key importance for the industrial adoption and ultimate success of MDE. In this paper we present ATL: a model transformation language and its execution environment based on the Eclipse framework. ATL tools provide support for the major tasks involved in using a language: editing, compiling, executing, and debugging.
Conference Paper
In this paper we present a new approach to the automation of usability evaluation for interactive systems. Design ideas or complete systems are modeled as a conditional state machine. Then, user interactions with the system are simulated on the basis of tasks, by first searching for possible solution paths and then generating deviations from these paths under consideration of user groups and system attributes. The approach has been implemented into a workbench which supports the modeling of the system and the evaluation of the simulations. We present first results for the reliability of the approach in modeling interactions with a spoken dialog system.
Conference Paper
While multimodal systems are an active research field, there is no agreed-upon set of multimodal interaction parameters, which would allow to quantify the performance of such systems and their underlying modules, and would there for be necessary for a systematic evaluation. In this paper we propose an extension to established parameters describing the interaction with spoken dialog systems [1] in order to be used for multimodal systems. Focussing on the evaluation of a multimodal system, three usage scenarios for these parameters are given.
Conference Paper
This paper describes a multimodal architecture to control 3D avatars with speech dialogs and mouse events. We briefly describe the scripting language used to specify the sequences and the components of the architecture supporting the system. Then we focus on the evaluation procedure that is proposed to test the system. The discussion on the evaluation results shows us the future work to be accomplished.
Book
Quality of Telephone-Based Spoken Dialogue Systems is a systematic overview of assessment, evaluation, and prediction methods for the quality of services such as travel and touristic information, phone-directory and messaging, or telephone-banking services. A new taxonomy of quality-of-service is presented which serves as a tool for classifying assessment and evaluation methods, for planning and interpreting evaluation experiments, and for estimating quality. A broad overview of parameters and evaluation methods is given, both on a system-component level and for a fully integrated system. Three experimental investigations illustrate the relationships between system characteristics and perceived quality. The resulting information is needed in all phases of system specification, design, implementation, and operation. Although Quality of Telephone-Based Spoken Dialogue Systems is written from the perspective of an engineer in telecommunications, it is an invaluable source of information for professionals in signal processing, communication acoustics, computational linguistics, speech and language sciences, human factor design and ergonomics. © 2005 Springer Science + Business Media, Inc. All rights reserved.
Article
Computer-supported cooperative work (CSCW) holds great importance and promise for modern society. This paper provides an overview of seventeen papers comprising a symposium on CSCW. The overview also discusses some relationships among the contributions ...
Article
The evaluation of the usability and the learnability of a computer system may be performed with predictive models during the design phase. It may be done on the executable code as well as by observing the user in action. In the latter case, data collected in vivo must be processed. Our goal is to provide software supports for performing this difficult and time consuming task. This article presents an early analysis and experience towards the automatic evaluation of multimodal user interfaces. With this end in view, a generic Wizard of Oz platform has been designed to allow the observation and automatic recording of subjects'behavior while interacting with a multimodal interface. We then show how recorded data can be analyzed to detect behavioral patterns, and how deviations of such patterns from a data flow-oriented task model can be exploited by a software usability critic.