Fig 2 - uploaded by Thomas B. Moeslund
Content may be subject to copyright.
2D Plan of the 'A' Building at Fredrik Bajers Vej 7, Aalborg University. Left: Ground Floor; Right: 1st Floor.

2D Plan of the 'A' Building at Fredrik Bajers Vej 7, Aalborg University. Left: Ground Floor; Right: 1st Floor.

Source publication
Conference Paper
Full-text available
Intelligent MultiMedia (IntelliMedia) focuses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called “Chameleon” that can be tailored to conducting Inte...

Context in source publication

Context 1
... size of a standard office on the printout is 3x4cm which is a feasible size for the system. The 2D plan is shown in Figure 2. ...

Citations

... Others, developing general IntelliMedia platforms include CHAMELEON (Brøndsted et al. 1998, 2001) SmartKom (Reithinger 2001, Wahlster et al. 2001) Situated Artificial Communicators (Rickheit and Wachsmuth 1996), Communicative Humanoids (Thórisson 1996, 1997), AESOPWORLD (Okada 1996, 1997) and MultiModal Interfaces like INTER- ACT (Waibel et al. 1996). Other moves towards integration are reported in Denis and Carfantan (1993), Granström et al. (2002), Maybury (1997), Maybury and Wahlster (1998), Mc Kevitt (1994, 1995/96), Mc Kevitt et al. (2002) and Pentland (1993). ...
... Presently, there are ten software modules in CHAMELEON: blackboard , dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor (NLP), and Topsy as shown in Figure 1. More detail on CHAMELEON can be found in Brøndsted et al. 1998, 2001). An initial application of CHAMELEON is the IntelliMedia WorkBench which is a hardware and software platform as shown in Figure 2. One or more cameras and lasers can be mounted in the ceiling, microphone array placed on the wall and there is a table where things (objects, gadgets, people, pictures, 2D/3D models, building plans, or whatever) can be placed. ...
Article
Full-text available
Intelligent MultiMedia or MultiModal systems involve the computer processing, un- derstanding and production of inputs and outputs from at least speech, text, and visual information in terms of semantic representations. One of the central questions for these systems is what form of semantic representation should be used. Here, we look at current trends in multimodal semantic representation which are mainly XML- and frame- based, relate our experiences in the development of multimodal systems (CHAMELEON and CONFUCIUS) and conclude that producer/consumer, intention (speech acts), semantic-content, and timestamps are four important components of any multimodal semantic representation.
... ✐ ✐ ✐ ✐ A general suite of tools in the form of a software and hardware platform called CHAMELEON has been developed (seeFigure 1). This can be tailored to conduct IntelliMedia in various application domains414243444546. CHAMELEON has an open distributed processing architecture and includes ten agent modules: blackboard, dialogue manager, domain model, gesture recognizer, laser system, microphone array, speech recognizer, speech synthesizer, natural language processor, and a distributed Topsy learner. ...
Article
Navigation is the process by which people control their movement in virtual environments and is a corefunctional requirement for all virtual environment (VE) applications. Users require the ability to move, controllingorientation, direction of movement and speed, in order to achieve a particular goal within a VE. Navigation israrely the end point in itself (which is typically interaction with the visual representations of data) but applicationsoften place a high demand on navigation skills, which in turn means that a high level of support for navigationis required from the application. On desktop systems navigation in non-immersive systems is usually supportedthrough the usual hardware devices of mouse and keyboard. Previous work by the authors shows that many usersexperience frustration when trying to perform even simple navigation tasks — users complain about getting lost,becoming disorientated and finding the interface `difficult to use'. In this paper we report on work in progressin exploiting natural language processing (NLP) technology to support navigation in non-immersive virtualenvironments. A multi-modal system has been developed which supports a range of high-level (spoken) navigationcommands and indications are that spoken dialogue interaction is an effective alternative to mouse and keyboardinteraction for many tasks. We conclude that multi-modal interaction, combining technologies such as NLP withmouse and keyboard may offer the most effective interaction with VEs and identify a number of areas where furtherwork is necessary.ACM CSS: I.3.6 Computer Graphics Methodology and Techniques—Interaction and Techniques, I.3.7 Three-DimensionalGraphics and Realism—Virtual Reality, I.2.7 Natural Language Processing—Speech Recognitionand Synthesis
... In [4] the desk pointed to is larger than the length of the user's arm and a pointer is therefore used instead of the index finger. The tip of the pointer is found using background subtraction. ...
... One type of solution is presented in [7] where the thumb is used as a mouse bottom. Another, and more natural, is to acommandate the gesture with a spoken input [4], e.g. "select that (point) object". ...
Conference Paper
Full-text available
This paper describes the development of a natural interface to a virtual environment. The interface is through a natural pointing gesture and replaces pointing devices which are normally used to interact with virtual environments. The pointing gesture is estimated in 3D using kinematic knowledge of the arm during pointing and monocular computer vision. The latter is used to extract the 2D position of the user’s hand and map it into 3D. Off-line tests show promising results with an average errors of 8cm when pointing at a screen 2m away.
... In [4] the desk pointed to is larger than the length of the user's arm and a pointer is therefore used instead of the index finger. The tip of the pointer is found using background subtraction. ...
... One type of solution is presented in [7] where the thumb is used as a mouse bottom. Another, and more natural, is to accompany the gesture with a spoken input [4], e.g. "select that (point) object". ...
Article
Full-text available
This paper describes the development of a natural interface to a virtual environment. The interface is through a natural pointing gesture and replaces pointing devices which are normally used to interact with virtual environments. The pointing gesture is estimated in 3D using kinematic knowledge of the arm during pointing and monocular computer vision. The latter is used to extract the 2D position of the user's hand and map it into 3D. Off-line tests of the system show promising results with an average errors of 76mm when pointing at a screen 2m away. The implementation of a real time system is currently in progress and is expected to run with 25Hz.
... The tip of the index finger is found using an infra-red camera. In [4] the desk pointed to is larger than the length of the user's arm and a pointer is therefore used instead of the index finger. The tip of the pointer is found using background subtraction. ...
... One type of solution is presented in [7] where the thumb is used as a mouse bottom. Another, and more natural, is to acommandate the gesture with a spoken input [4], e.g. " select that (point) object " . ...
Conference Paper
Full-text available
This paper describes the development of a natural interface to a virtual environment. The interface is through a natural pointing gesture and replaces pointing devices which are normally used to interact with virtual environments. The pointing gesture is estimated in 3D using kinematic knowledge of the arm during pointing and monocular computer vision. The latter is used to extract the 2D position of the user's hand and map it into 3D. Off-line tests show promising results with an average errors of 8cm when pointing at a screen 2m away.
... In [4] the desk pointed to is larger than the length of the user's arm and a pointer is therefore used instead of the index finger. The tip of the pointer is found using background subtraction. ...
... One type of solution is the one presented in [7] where the thumb is used as a mouse bottom. Another, and more natural, is to acommandate the gesture with a spoken input [4], e.g. "select that (point) object". ...
Article
Full-text available
. This paper proposes a new and natural human computer interface for interacting with virtual environments. The 3D pointing direction of a user in a virtual environment is estimated using monocular computer vision. The 2D position of the user's hand is extracted in the image plane and then mapped to a 3D direction using knowledge about the position of the user's head and kinematic constraints of a pointing gesture due to the human motor system. Off-line tests of the system show promising results. The implementation of a real time system is currently in progress and is expected to run with 25Hz. 1
... Numerous distributed processing tools and platforms have been developed that assist the creation of intelligent multimodal distributed systems. DACS (Distributed Applications Communication System) [4], as used in the CHAMELEON platform [5], is a tool for process synchronisation and intercommunication. DACS uses simple asynchronous message passing to enable large distributed systems to be developed. ...
... Psyclone is discussed in more detail in section 2.3. Other work in this area includes CHAMELEON [5], a blackboard-based architecture for processing multimodal data, COMIC [11], a multimodal dialogue system and XWAND [12], a user interface for intelligent spaces which also uses Bayesian decision-making. ...
Article
Full-text available
Research relating to the development of an intelligent multimedia distributed platform hub (MediaHub) is presented. Related research is reviewed and a new approach to decision-making based on Bayesian networks is proposed. A system architecture, including a Whiteboard, Dialogue Manager, Semantic Representation Database and Decision-Making Module is outlined. Psyclone, a platform for distributed processing, will facilitate communication within MediaHub, Bayesian networks will enact decision-making within the Decision-Making Module and the Hugin Bayesian decision-making tool will implement Bayesian reasoning within MediaHub.
Article
Full-text available
The goal of this paper is to describe the estab-lishment of a new interdisciplinary research eld and post graduate study programme at Aalborg University. The eld is Intelligent MultiMedia (IMM), in which we mainly focus on advanced processing and integration of information from a variety of modalities. Most notably visual and spoken information sources. The paper illustrates the approach taken, partly by describing the estab-lishment of a common platform (the Intellimedia WorkBench), and partly by describing a number of applications developed in student projects, utiliz-ing the WorkBench. Image processing plays an im-portant role in the applications, e.g. for tracking, classifying objects and for combined audio-visual speech recognition. The paper mainly focuses on presenting the overall approach and a number of diierent applications. The aim of the paper is to give an overview rather than to present many de-tails.
Conference Paper
Full-text available
The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore fair to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. We first sketch an interactive system that integrates automatic speech processing with image understanding. Then, we concentrate on performance assessment which we believe is an emerging key issue in multimodal interaction. We explain the benefit of time scale analysis and usability studies and evaluate our system accordingly.