
Michael KippTechnical University of Applied Sciences Augsburg · Faculty of Computer Science
Michael Kipp
Professor
Working with my DMZ team on the campus app. Exploring AI for higher ed. Teaching HCI with a strong research focus.
About
91
Publications
26,871
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,531
Citations
Introduction
Director of the Center for Didactics and Media at Augsburg Technical University of Applied Sciences with "gP cycle" as our main project
Additional affiliations
September 2011 - present
January 2008 - January 2011
January 2005 - January 2008
Publications
Publications (91)
The translation and rotation of objects with two fingers is a well explored multitouch technique. However, there are some unsolved questions regarding the optimal conditions under which this technique functions best. Does it matter in which direction the movement is oriented? Does parallel or sequential performance of the two operations work best?...
Predicting the efficiency of interaction techniques can be crucial for designing user interfaces. While models like Fitts' law make general predictions, there is little research on how efficiency varies under different conditions like in which screen region a movement starts and in which direction it is going, and whether the surface is horizontal...
We argue that future mobile interfaces should differentiate between various contextual factors like grip and active fingers, adjusting screen elements and behaviors automatically, thus moving from merely responsive design to responsive interaction. Toward this end we conducted a systematic study of screen taps on a mobile device to find out how the...
Programming is an essential cross-disciplinary skill, yet teaching it effectively in large classes can be challenging due to the need for close feedback loops. Identifying and addressing common misconceptions is particularly important during the initial stages of learning to program. While automated interactive tutoring systems have the potential t...
Animated characters that move and gesticulate appropriately with spoken text are useful in a wide range of applications. Unfortunately, this class of movement is very difficult to generate, even more so when a unique, individual movement style is required. We present a system that, with a focus on arm gestures, is capable of producing full-body ges...
While social relationships are important for human well-being, maintaining these relationships can be difficult, especially for individuals living apart from friends and loved ones. We present "Light Bridge", an ambient spatial interaction concept designed to convey a sense of closeness and co-presence through changes in ambient lighting that refle...
Touch-sensitive surfaces are already a standard form of interaction. These surfaces come in many different sizes like tablets or touch walls. However, there is little research to characterize the impact of surface size on touch performance. We conducted a Fitts' Law study of three display sizes (13.5" tablet, 28" monitor, 69.5" large monitor), comp...
Um die Digitalisierung in der Lehre an Präsenzhochschulen voranzutreiben, sind Lernmanagementsysteme (LMS) wie Moodle von grundlegender Bedeutung. Lehrende können dort nicht nur Inhalte anbieten, sondern auch moderne Lehr-/Lernszenarien mit innovativen Methoden umsetzen. Wer einmal auf der Plattform arbeitet, kann sukzessive neue Tools ausprobieren...
The qualitative analysis of nonverbal communication is more and more relying on 3D recording technology. However, the human analysis of 3D data on a regular 2D screen can be challenging as 3D scenes are difficult to visually parse. To optimally exploit the full depth of the 3D data, we propose to enhance the 3D view with a number of visualizations...
Empirical research often involves three activities: the systematic annotation of audiovisual media (coding), the management of the resulting data in a corpus and various forms of statistical analysis. This chapter presents ANVIL, a highly generic and theory-independent research tool that supports all three activities in conjunction with audio, vide...
Precision tasks in 3D like object manipulation or character animation call for new gestural interfaces that utilize many input degrees of freedom. We present MotionBender, a sensor-based interaction technique for post-editing the motion of e.g. the hands in character animation data. For the visualization of motion we use motion paths, often used fo...
While the availability of multimedia data, including video, audio and human movement recording, is steadily growing, the integrated viewing, annotation, and analysis of these complex data is still a challenge. This chapter introduces ANVIL as an example of a multimedia annotation and analysis tool and presents recent extensions: The 3D viewing of m...
Creating interactive applications with multiple virtual characters comes along with many challenges that are related to different areas of expertise. The definition of context-sensitive interactive behavior requires expert pro- grammers and often results in hard-to-maintain code. To tackle these challenges, we suggest a visual authoring ap- proach...
Avatare (künstliche Computerfigu-ren) bieten die Möglichkeit, schrift-sprachliche Inhalte, bspw. von Inter-netseiten, in Gebärdensprache wie-derzugeben und damit für gehörlose Gebärdensprach-NutzerInnen zu-gänglich zu machen. Entscheidend hierfür ist jedoch die Akzeptanz die-ser neuen Technologie innerhalb der Gehörlosengemeinschaft. In der Avatar-...
Human motion is challenging to analyze due to the many degrees of freedom of the human body. While the qualitative analysis of human motion lies at the core of many research fields, including multimodal communication, it is still hard to achieve reliable results when human coders transcribe motion with abstract categories. In this paper we tackle t...
This paper shows how interoperable annotations of multimodal dialogue, which apply the annotation scheme and the markup language (DiAML, Dialogue Act Markup Language) defined ISO standard 24617-2, can conveniently be obtained using the newly implemented facility in the ANVIL annotation tool to produce XML-based output directly in the DiAML format....
Signing avatars have the potential to become a useful and even cost-effective method to make written content more accessible for Deaf people. However, avatar research is characterized by the fact that most researchers are not members of the Deaf community, and that Deaf people as potential users have little or no knowledge about avatars. Therefore,...
We investigate how lighting can be used to influence how the personality of virtual characters is perceived. We propose a character-centric lighting system composed of three dynamic lights that can be configured using an interactive editor. To study the effect of character-centric lighting on observers, we created four lighting configurations deriv...
Many deaf people have significant reading problems. Written content, e.g. on internet pages, is therefore not fully accessible for them. Embodied agents have the potential to communicate in the native language of this cultural group: sign language. However, state-of-the-art systems have limited comprehensibility and standard evaluation methods are...
While current virtual characters may look photorealistic they often lack behavioral complexity. Emotion may be the key ingredient
to create behavioral variety, social adaptivity and thus believability. While various models of emotion have been suggested,
the concrete parametrization must often be designed by the implementer. We propose to enhance a...
Gaze is known to be an important social cue in face-to-face communication indicating interest, focus of attention, or turn-taking intentions. Further, speaker gaze can influence situated utterance comprehension by driving both interlocutor's visual attention towards the same object thereby grounding and disambiguating referring expressions (Hanna a...
Gaze as Visual Reference Gaze is known to be an important social cue in face-to-face communication indicating focus of attention. Speaker gaze can influence object perception and situated utterance comprehension by driving both interlocutors' visual attention towards the same object; hence facilitating grounding and disambiguation [1]. The precise...
Controlling a high-dimensional structure like a 3D humanoid skeleton is a challenging task. Intuitive interfaces that allow non-experts to perform character animation with standard input devices would open up many possibilities. Therefore, we propose a novel multitouch interface for simultaneously controlling the many degrees of freedom of a human...
Embodied agents have the potential to become a highly natural human-computer interaction device – they are already is use
as tutors, presenters and assistants. However, it remains an open question whether adding an agent to an application has a
measurable impact, positive or negative, in terms of motivation and learning performance. Prior studies a...
Generating coordinated multimodal behavior for an embodied agent (speech, gesture, facial expression...) is challenging. It
requires a high degree of animation control, in particular when reactive behaviors are required. We suggest to distinguish
realization planning, where gesture and speech are processed symbolically using the behavior markup lan...
Welcome to the special issue on Intelligent Virtual Agents. Intelligent Virtual Agents (IVAs) are interactive characters that – in spite of being merely 2D or 3D computer graphics models – exhibit humanlike qualities and communicate with humans or with each other using natural human modalities such as speech and gesture. They are capable of real-ti...
Embodied agents are a powerful paradigm for current and future multimodal interfaces yet require high effort and expertise for their creation, assembly, and animation control. Therefore, open animation engines and high-level control languages are required to make embodied agents accessible to researchers and developers. We present EMBR, a new real-...
The question how exactly gesture and emotion are interrelated is still sparsely covered in research, yet highly relevant for building affective artificial agents. In our study, we investigate how basic gestural form features (handedness, hand shape, palm orientation and motion direction) are related to components of emotion. We argue that material...
Multimodal user interfaces are becoming more and more important in human–machine communication. Essential representatives
of such interfaces are virtual agents that aim to act like humans in the way they employ gestures, facial expression, posture
and prosody to convey their emotions in face-to-face communication. Furthermore, if we employ a presen...
Embodied agents can be a powerful interface for natural human-computer interaction. While graphical realism is steadily increasing,
the complexity of believable behavior is still hard to create and maintain. We propose a hybrid and modular approach to modeling
the agent’s control, combining state charts and rule processing. This allows us to choose...
In conjunction with Anvil and suitable annotation schemes, GAnTooL (A Gesture Annotation And Modeling Tool for Anvil) is a tool to annotate human nonverbal behavior like gestures and poses
efficiently with the help of a skeleton. Using intuitive controls the user can quickly mirror the observed speaker’s poses.
The results can be used to build gest...
Effective speakers engage their whole body when they gesture. It is difficult, however, to create such full body motion in
animated agents while still supporting a large and flexible gesture set. This paper presents a hybrid system that combines
motion capture data with a procedural animation system for arm gestures. Procedural approaches are well...
Welcome to the proceedings of the 9th International Conference on Intelligent Virtual Agents, held September 14–16, 2009 in Amsterdam, The Netherlands. Intelligent virtual agents (IVAs) are interactive characters that exhibit human-like qualities and communicate with humans or with each other using natural human modalities such as speech and gestur...
Embodied agents can be powerful interface devices and versatile research tools for the study of emotion, gesture, facial expression etc. However, they require high effort and expertise for their creation, assembly and animation control. Therefore, open animation engines and high-level control languages are required to make embodied agents accessibl...
Welcome to the Proceedings of the 9th International Conference on Intelligent Virtual Agents, held 14-16 September, 2009 in Amsterdam, The Netherlands. Intelligent Virtual Agents (IVAs) are interactive characters that exhibit humanlike qualities and communicate with humans or with each other using natural human modalities such as speech and gesture...
We present a comparative study of two gesture specification languages. Our aim is to derive requirements for a new, optimal
specification language that can be used to extend the emerging BML standard. We compare MURML, which has been designed to
specify coverbal gestures, and a language we call LV, originally designed to describe French Sign Langua...
Welcome to the Proceedings of the 9th International Conference on Intelligent Virtual Agents, held 14-16 September, 2009 in Amsterdam, The Netherlands. Intelligent Virtual Agents (IVAs) are interactive characters that exhibit humanlike qualities and communicate with humans or with each other using natural human modalities such as speech and gesture...
The book presents a cross-section of state-of-the-art research on multimodal corpora,a highly interdisciplinary area that is a prerequisite for various specialized disciplines. A number of the papers included are revised and expanded versions of papers accepted to the InternationalWorkshop on Multimodal Corpora: From Models of Natural Interaction t...
In this paper we present an interactive poker game in which one human user plays against two animated agents using RFID-tagged
poker cards. The game is used as a showcase to illustrate how current AI technologies can be used for providing new features
to computer games. A powerful and easy-to-use multimodal dialog authoring tool is used for modelin...
We present IGaze, a semi-immersive human-avatar interaction system. Using head tracking and an illusionistic 3D effect we
let users interact with a talking avatar in an application interview scenario. The avatar features reactive gaze behavior
that adapts to the user position according to exchangeable gaze strategies. In user studies we showed that...
In this paper we present two virtual characters in an interactive poker game using RFID-tagged poker cards for the interaction.
To support the game creation process, we have combined models, methods, and technology that are currently investigated in
the ECA research field in a unique way. A powerful and easy-to-use multimodal dialog authoring tool...
Abstract.This paper presents an empirical evaluation of a method called
This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of differ- ent tools, working on the same data, can be desirable for pro...
We present a new coding mechanism, spatiotemporal coding, that allows coders to annotate points and regions in the video frame by drawing directly on the screen. Coders can not only attach labels to time intervals in the video but can specify a possibly moving region on the video screen. This opens up the spatial dimension for multi-track video cod...
We present ERIC, an affective embodied agent for realtime com- mentary in many domains. The underlying architecture is rule- based, generic, andlightweight-basedonJava/Jessmodules. Apart from reasoning about dynamically changing events, the system can produce coherent natural language and non-verbal behaviour, based on a layered model of affect (pe...
The empirical investigation of human gesture stands at the center of multiple research disciplines, and various gesture annotation schemes exist, with varying degrees of precision and required annotation effort. We present a gesture annotation scheme for the specific purpose of automatically generating and animating character-specific hand/arm gest...
Virtual humans still lack naturalness in their nonverbal be- haviour. We present a data-driven solution that moves towards a more natural synthesis of hand and arm gestures by recreating gestural be- haviour in the style of a human performer. Our algorithm exploits the concept of gesture units to make the produced gestures a continuous flow of move...
Since the beginning of the SAIBA effort to unify key interfaces in the multi-modal behavior generation process, the Behavior
Markup Language (BML) has both gained ground as an important component in many projects worldwide, and continues to undergo
further refinement. This paper reports on the progress made in the last year in further developing BM...
Providing virtual characters with natural gestures is a complex task. Even if the range of gestures is limited, deciding when to play which gesture may be considered both an engineering or an artistic task. We want to strike a balance by presenting a system where gesture selection and timing can be human authored in a script, leaving full artistic...
When using virtual characters in the human-computer inter- face the question arises of how useful this kind of interface is: whether the human user accepts, enjoys and profits from this form of interaction. Thorough system evaluations, however, are rarely done. We propose a post-questionnaire evaluation for a virtual character system that we ap- pl...
Animated characters that move and gesticulate appropriately with spoken text are useful in a wide range of applications. Unfortunately, they are very difficult to generate, even more so when a unique, individual movement style is required. We present a system that is capable of producing full-body gesture animation for given input text in the style...
We present COHIBIT, an edutainment exhibit for theme parks in an ambient intelligence environment. It combines ultimate robustness and simplic- ity with creativity and fun. The visitors can use instrumented 3D puzzle pieces to assemble a car. The key idea of our edutainment framework is that all actions of a visitor are tracked and commented by two...
Users require more effective and efficient means of interaction with increasingly complex information and new interactive devices. This document summarizes the results of the international Dagstuhl Seminar on Coordination and Fusion in Multimodal Interaction that took place at Schloss Dagstuhl in Germany October 27 through November 2, 2001 1 . We f...
This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three...
CrossTalk is a self-explaining virtual character exhibition for public spaces. This paper presents the CrossTalk system, including its authoring tool SceneMaker and the CarSales exhibit. CrossTalk extends the commonplace human-to-screen interaction to an interaction triangle. The user faces two separated screens inhabited with virtual characters an...
We present an extension of the CrossTalk system that allows to model emotional behaviour on three levels: scripting, processing and expression. CrossTalk is a self-explaining virtual character exhibition for public spaces. Its SceneMaker authoring suite provides authors with a screenplay-like language for scripting character and user interactions....
In this paper, we introduce a toolkit called SceneMaker for authoring scenes for adaptive, interactive performances. These performances are based on automatically generated and pre-scripted scenes which can be authored with the SceneMaker in a two-step approach: In step one, the scene flow is defined using cascaded finite state machines. In a secon...
We first introduce CrossTalk, an interactive installation with ani-mated presentation characters that has been designed for public spaces, such as an exhibition, or a trade fair. The installation relies on what we call a meta-theater metaphor. Quite similar to professional actors, characters in CrossTalk are not always on duty. Rather, they can ste...
Embodied conversational agents provide a promising option for present-ing information to users. This contribution revisits a number of past and ongoing systems with animated characters that have been developed at DFKI. While in all systems the purpose of using characters is to convey information to the user, there are significant variations in the...
This dissertation shows how to generate conversational gestures for an animated agent based on annotated text input. The central idea is to imitate the gestural behavior of human individuals. Using TV show recordings as empirical data, gestural key parameters are extracted for the generation of natural and individual gestures. For each of the three...
We demonstrate how the Tycoon framework can be put to practice with the Anvil tool in a concrete case study. Tycoon offers a coding scheme and analysis metrics for multimodal communication scenarios. Anvil is a generic, extensible and ergonomically designed annotation tool for videos. In this paper, we describe the Anvil tool, the Tycoon scheme/met...
This paper introduces CrossTalk, an interactive instal- lation with animated presentation agents. CrossTalk is an attempt to spatially extend the interaction experi- ence beyond the usual single front screen. It offers two separated agent spaces (screens), where the agents "live", which form a triangle with the user's control panel. In this setting...
Anvil is a tool for the annotation of audiovisual material con- taining multimodal dialogue. Annotation takes place on freely definable, multiple layers (tracks) by inserting time-anchored elements that hold a number of typed attribute-value pairs. Higher-level elements (suprasegmental) consist of a sequence of elements. Attributes contain symbols...