Conference Paper

Semi-automatic multimodal user interface generation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Multimodal applications are typically developed together with their user interfaces, leading to a tight coupling. Additionally, human-computer interaction is often less considered. This potentially results in a worse user interface when additional modalities have to be integrated and/or the application shall be developed for a different device. A promising way of creating multimodal user interfaces with less effort for applications running on several devices is semi-automatic generation. This work shows the generation of multimodal interfaces where a discourse model is transformed to different automatically rendered modalities. It supports loose coupling of the design of human-computer interaction and the integration of specific modalities. The presented communication platform utilizes this transformation process. It allows for high-level integration of input like speech, hand gesture and a WIMP-UI. The generation of output is possible with the modalities speech and GUI. Integration of other input and output modalities is supported as well. Furthermore, the platform is applicable for several applications as well as different devices, e.g., PDAs and PCs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Dies wird als Modalität einer Anwendung bezeichnet [25,124]. Die Vielzahl möglicher Interaktionsmuster erschwert oder verhindert sogar die generelle Generierbarkeit beliebiger Benutzungsschnittstellen [58,170,180,218]. ...
Thesis
Full-text available
Digitalisierung ermöglicht es uns, mit Partnern (z.B. Unternehmen, Institutionen) in einer IT-unterstützten Umgebung zu interagieren und Tätigkeiten auszuführen, die vormals manuell erledigt wurden. Ein Ziel der Digitalisierung ist dabei, Dienstleistungen unterschiedlicher fachlicher Domänen zu Prozessen zu kombinieren und vielen Nutzergruppen bedarfsgerecht zugänglich zu machen. Hierzu stellen Anbieter technische Dienste bereit, die in unterschiedliche Anwendungen integriert werden können. Die Digitalisierung stellt die Anwendungsentwicklung vor neue Herausforderungen. Ein Aspekt ist die bedarfsgerechte Anbindung von Nutzern an Dienste. Zur Interaktion menschlicher Nutzer mit den Diensten werden Benutzungsschnittstellen benötigt, die auf deren Bedürfnisse zugeschnitten sind. Hierzu werden Varianten für spezifische Nutzergruppen (fachliche Varianten) und variierende Umgebungen (technische Varianten) benötigt. Zunehmend müssen diese mit Diensten anderer Anbieter kombiniert werden können, um domänenübergreifend Prozesse zu Anwendungen mit einem erhöhten Mehrwert für den Endnutzer zu verknüpfen (z.B. eine Flugbuchung mit einer optionalen Reiseversicherung). Die Vielfältigkeit der Varianten lässt die Erstellung von Benutzungsschnittstellen komplex und die Ergebnisse sehr individuell erscheinen. Daher werden die Varianten in der Praxis vorwiegend manuell erstellt. Dies führt zur parallelen Entwicklung einer Vielzahl sehr ähnlicher Anwendungen, die nur geringes Potential zur Wiederverwendung besitzen. Die Folge sind hohe Aufwände bei Erstellung und Wartung. Dadurch wird häufig auf die Unterstützung kleiner Nutzerkreise mit speziellen Anforderungen verzichtet (z.B. Menschen mit physischen Einschränkungen), sodass diese weiterhin von der Digitalisierung ausgeschlossen bleiben. Die Arbeit stellt eine konsistente Lösung für diese neuen Herausforderungen mit den Mitteln der modellgetriebenen Entwicklung vor. Sie präsentiert einen Ansatz zur Modellierung von Benutzungsschnittstellen, Varianten und Kompositionen und deren automatischer Generierung für digitale Dienste in einem verteilten Umfeld. Die Arbeit schafft eine Lösung zur Wiederverwendung und gemeinschaftlichen Nutzung von Benutzungsschnittstellen über Anbietergrenzen hinweg. Sie führt zu einer Infrastruktur, in der eine Vielzahl von Anbietern ihre Expertise in gemeinschaftliche Anwendungen einbringen können. Die Beiträge bestehen im Einzelnen in Konzepten und Metamodellen zur Modellierung von Benutzungsschnittstellen, Varianten und Kompositionen sowie einem Verfahren zu deren vollständig automatisierten Transformation in funktionale Benutzungsschnittstellen. Zur Umsetzung der gemeinschaftlichen Nutzbarkeit werden diese ergänzt um eine universelle Repräsentation der Modelle, einer Methodik zur Anbindung unterschiedlicher Dienst-Anbieter sowie einer Architektur zur verteilten Nutzung der Artefakte und Verfahren in einer dienstorientierten Umgebung. Der Ansatz bietet die Chance, unterschiedlichste Menschen bedarfsgerecht an der Digitalisierung teilhaben zu lassen. Damit setzt die Arbeit Impulse für zukünftige Methoden zur Anwendungserstellung in einem zunehmend vielfältigen Umfeld.
... As for user model discussion, in the early surveyed period, systems tailored dialogue by using semantic models [6] and rule-based approaches [62]. Later, statistical approaches have been used [63] in order to achieve a more practical, effective, and theoretically well-founded approach to adaptivity, as well as semi-automatic rule-based methods [64]. A further approach recently used to model interaction dialogue is ontology, as proposed in [65]. ...
Conference Paper
Multimodal systems use integrated multiple interaction modalities (e.g. speech, sketch, handwriting, etc.) enabling users to benefit of a communication more similar to the human-human communication. To develop multimodal systems, several research questions have been addressed in the literature from the early 80s till the present day, such as multimodal fusion, recognition, dialogue interpretation and disambiguation, fission, context adaptation, etc. This paper investigates studies developed in the last decade, by analyzing the evolution of the approaches applied to face the main research questions related to multimodal fusion, interpretation, and context adaptation. As result, the paper provides a discussion on the reasons that led to shift attention from one methodology to another.
... With the ongoing success of the Internet of Services and the accompanied shift of business logic and data into cloud infrastructures, some modern approaches also consider browser-based UIs [28] as well as UIs for services [19]. In parallel, the inclusion of additional modalities beyond point-and-click respectively touch raises increased interest [7,11,18]. However, true multimodality, i.e. the simultaneous and cross-referencing use of several modalities as well as support for natural language discourse phenomena cannot be achieved so far with these model-based approaches. ...
Chapter
We will show how to build innovative multimodal dialog user interfaces that integrate multiple heterogeneous web services as data sources on the basis of the Ontology-based Dialog Platform (ODP). More specifically, we will describe how to exploit ODP’s well-defined extension points and how generic ODP processing modules can be adopted, in order to support a rapid dialog system engineering process. By means of the latest ODP-based educational information system CIRIUS and the ODP workbench, a set of Eclipse-based editors and tools, we demonstrate step-by-step along the generic multimodal dialog processing chain what has to be done for developing a new multimodal dialog user interface for a specific application domain.
... Another approach [6] proposes and investigates a semi-automatic rule-based generation of multimodal user interfaces where a discourse model (representing modality-independent interaction) is transformed to different, automatically rendered modalities. Although the model transformation process exploited in that work can be compared to the one used in our approach, tool support has not been deeply investigated in that approach. ...
Conference Paper
Full-text available
This paper presents a set of tools to support multimodal adaptive Web applications. The contributions include a novel solution for generating multimodal interactive applications, which can be executed in any browser-enabled device; and run-time support for obtaining multimodal adaptations at various granularity levels, which can be specified through a language for adaptation rules. The architecture is able to exploit model-based user interface descriptions and adaptation rules in order to achieve adaptive behaviour that can be triggered by dynamic changes in the context of use. We also report on an example application and a user test concerning adaptation rules changing dynamically its multimodality.
Article
Conversational agents are widely used in many situations, especially for speech tutoring. However, their contents and functions are often pre-defined and not customizable for people without technical backgrounds, thus significantly limiting their flexibility and usability. Besides, conventional agents often cannot provide feedback in the middle of training sessions because they lack technical approaches to evaluate users' speech dynamically. We propose JustSpeak: automated and interactive speech tutoring agents with various configurable feedback mechanisms, using any speech recordings with its transcription text as the template for speech training. In JustSpeak, we developed an automated procedure to generate customized tutoring agents from user-inputted templates. Moreover, we created a set of methods to dynamically synchronize speech recognizers' behavior with the agent's tutoring progress, making it possible to detect various speech mistakes dynamically such as being stuck, mispronunciation, and rhythm deviations. Furthermore, we identified the design primitives in JustSpeak to create different novel feedback mechanisms, such as adaptive playback, follow-on training, and passive adaptation. They can be combined to create customized tutoring agents, which we demonstrate with an example for language learning. We believe JustSpeak can create more personalized speech learning opportunities by enabling tutoring agents that are customizable, always available, and easy-to-use.
Chapter
In recent years, the growing improvements of the computational capability of the mobile and desktop devices, jointly to the potentialities of the current fast network connections have allowed the wide spread of advanced and complex applications and services belonging to the social computing area. The most current approaches used to interact with this kind of applications and services (hereinafter called social computing environments) do not seem able to provide an effective and exhaustive support to the human-computer interaction process. For this reason, in order to overcome this kind of problems, it is necessary to turn to more suitable interaction methodologies. In this context, human-oriented interfaces can be profitably used to support every kind of social computing environment. More specifically, multimodal interfaces enable users an effortless and powerful communication way to represent concepts and commands on different mobile and desktop devices. This chapter explores the more suitable possibilities to employ multimodal frameworks (and related algorithmic approaches) in order to interact with different kinds of social computing environments.
Article
Extended Library for Visual Interactive Applications (ELVIA) is a programming tool developed and used by the two most important public universities in the Baja California peninsula in México. ELVIA provides a Java class framework that helps novice students of programming to automatically generate the graphical user interface of interactive applications, focusing in the classes and objects that compose the applications. This paper describes ELVIA, some application examples and experiences applying this programming tool in introductory programming courses. © 2009 Wiley Periodicals, Inc. Comput Appl Eng Educ 20: 214–220, 2012
Conference Paper
Reliable high-level fusion of several input modalities is hard to achieve, and(semi-)automatically generating it is even more difficult. However, it is important to address in order to broaden the scope of providing user interfaces semi-automatically.Our approach starts from a high-level discourse model created by a human interaction designer. It is modality independent, so an annotated discourse is semiautomatically generated, which influences the fusion mechanism. Our high-level fusion checks hypotheses from the various input modalities by use of finite state machines. These are modality independent, and they are automatically generated from the given discourse model. Taking all this together, our approach provides semi-automatic generation of high-level fusion. It currently supports input modalities graphical user interface (simple) speech, a few hand gestures, and a bar code reader.
Conference Paper
Fission of several output modalities poses hard problems, and (semi-)automatically configuring it is even more difficult. However, it is important to address the latter in order to broaden the scope of providing user interfaces semi-automatically. Our approach starts from a high-level discourse model created by a human interaction designer. It is modality-independent,so a modality-annotated discourse model is semi-automatically generated. Based on it, our fission is semiautomatically configured. It currently supports output modalities graphical user interface, (canned) speech output, and a new modality that we call movement as communication. The latter involves movements of a semi-autonomous robot in 2D-space for reinforcing the communication of the other modalities.
Article
Full-text available
This paper describes a framework that serves as a reference for classifying user interfaces supporting multiple targets, or multiple contexts of use in the field of context-aware computing. In this framework, a context of use is decomposed into three facets: the end users of the interactive system, the hardware and software computing platform with which the users have to carry out their interactive tasks and the physical environment where they are working. Therefore, a context-sensitive user interface is a user interface that exhibits some capability to be aware of the context (context awareness) and to react to changes of this context. This paper attempts to provide a unified understanding of context-sensitive user interfaces rather than a prescription of various ways or methods of tackling different steps of development. Rather, the framework structures the development life cycle into four levels of abstraction: task and concepts, abstract user interface, concrete user interface and final user interface. These levels are structured with a relationship of reification going from an abstract level to a concrete one and a relationship of abstraction going from a concrete level to an abstract one. Most methods and tools can be more clearly understood and compared relative to each other against the levels of this framework. In addition, the framework expresses when, where and how a change of context is considered and supported in the context-sensitive user interface thanks to a relationship of translation. In the field of multi-target user interfaces is also introduced, defined, and exemplified the notion of plastic user interfaces. These user interfaces support some adaptation to changes of the context of use while preserving a predefined set of usability properties.
Conference Paper
Full-text available
This article presents FAME, a model-based Framework for Adaptive Multimodal Environments. FAME proposes an architecture for adaptive multimodal applications, a new way to represent adaptation rules - the behavioral matrix - and a set of guidelines to assist the design process of adaptive multimodal applications. To demonstrate FAME’s validity, the development process of an adaptive Digital Talking Book player is summarized.
Conference Paper
Full-text available
Communication between humans is multimodal and involves movements as well. While communication between humans and robots is becoming more and more multimodal, movements of a robot in 2D space have not yet been used for communication. In this paper, we present a new approach to multimodal communication with (semi-)autonomous robots, that even includes movements of a robot in 2D space as a form of expressing communicative acts. We also show how such a multimodal human-robot interface can be generated from a discourse-based interaction design that does not even include information about modalities.
Conference Paper
Full-text available
In this paper we present a demonstration of the Migrantes environment for supporting user interface migration through different devices, including mobile ones and digital TV. The goal of the system is to furnish user interfaces that are able to migrate across different devices, in such a way as to support task continuity for the mobile user. This is obtained through a number of transformations that exploit logical descriptions of the user interfaces to be handled. The migration environment supports the automatic discovery of client devices and its architecture is based on the composition of a number of software services required to perform a migration request.
Conference Paper
Full-text available
A transformational approach for developing multimodal web user interfaces is presented that progressively moves from a task model and a domain model to a final user interface. This approach consists of three steps: deriving one or many abstract user interfaces from a task model and a domain model, deriving one or many concrete user interfaces from each abstract one, and producing the code of the corresponding final user interfaces. To ensure these steps, transformations are encoded as graph transformations performed on the involved models expressed in their graph equivalent. For each step, a graph grammar gathers relevant graph transformations for accomplishing the sub-steps. The final user interface is multimodal as it involves graphical (keyboard, mouse) and vocal interaction. The approach outlined in the paper is illustrated throughout a running example for a graphical interface, a vocal interface, and two multimodal interfaces with graphical and vocal predominances, respectively.
Conference Paper
Full-text available
The development of an intelligent user interface that supports multimodal access to multiple applications is a challenging task. In this paper we present a generic multimodal interface system where the user interacts with an anthropomorphic personalized interface agent using speech and natural gestures. The knowledge-based and uniform approach of SmartKom enables us to realize a comprehensive system that understands imprecise, ambiguous, or incomplete multimodal input and generates coordinated, cohesive, and coherent multimodal presentations for three scenarios, currently addressing more than 50 different functionalities of 14 applications. We demonstrate the main ideas in a walk through the main processing steps from modality fusion to modality fission.
Conference Paper
Full-text available
The model-driven User Interface (UI) development life cycle usually evolves from high-level models, which represent abstract UI concepts, to concrete models, which are more related to the UI implementation details, until the final UI is generated. This process is based on a set of model-to-model and model-to-code transformations. Several industrial tools have applied this approach in order to generate the UI. However, these model transformations are mainly fixed and are not always the best solution for a specific UI. In this work, the notion of Transformation Profile is introduced to better specify the model-to-model transformations. A Transformation Profile is made up of a set of predefined Model Mappings and a Transformation Template. The mappings connect initial and target UI models in a flexible way, whereas the Transformation Template gathers high-level parameters to apply to the transformation. As a consequence, a Transformation Profile enables designers to define parameterized transformations that could be reused for another UI development project.
Conference Paper
End users of software typically have to let someone else develop it and its user interface, or to learn to design and to program it themselves. Especially user interfaces developed by someone else may not fit well the given task. Designing and programming is hard and takes a lot of effort in general, and even more so for people not especially trained or experienced. Therefore, we propose end-user development of user interfaces through a new approach and interface for discourse modeling. End users may themselves model an interaction design as a discourse (in the sense of a dialogue between human and computer). From such an interaction design, eventually a user interface is to be generated automatically by a tool. As a consequence, end-user development becomes end-user modeling instead of programming.
Article
Multimodal interfaces are systems that allow input and/or output to be conveyed over multiple channels such as speech, graphics, and gesture. In addition to parsing and understanding separate utterances from different modes such as speech or gesture, multimodal interfaces also need to parse and understand composite multimodal utterances that are distributed over multiple input modes. We present an approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. In comparison to previous approaches, this approach is significantly more efficient and provides a more general probabilistic framework for multimodal ambiguity resolution. The approach also enables tight-coupling of multimodal understanding with speech recognition. Since the finite-state approach is more lightweight in computational needs, it can be more readily deployed on a broader range of mobile platforms. We provide speech recognition results that demonstrate compensation effects of exploiting gesture information in a directory assistance and messaging task using a multimodal interface.
Design options for multimodal web applications. Computer-Aided Design Of User Interfaces V
  • A Stanciulescu
  • J Vanderdonckt