Conference Paper

Object-Oriented Video: Interaction with Real-World Objects Through Live Video.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Graphics and live video are widely employed in remotely-controlled systems like industrial plants. Interaction with live video is, however, more limited compared with graphics as users cannot interact with objects being observed in the former. Object-Oriented Video techniques are described allowing object-oriented interactions, including the use of real-world objects in live video as reference cues, direct manipulation of them, and graphic overlays based on them, which enable users to work in a real spatial context conveyed by the video. Users thereby understand intuitively what they are operating and see the result of their operation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recent approaches for direct selection of the target device are based on hand gestures, and laser pointers [1]. Another approach for device selection is using a touch screen with camera captured image [2] [4]. To provide assistive guidance for the target device with complicated action and function, overlay image based on augmented reality approach is a standard technique [5]. ...
... Another promising approach uses camera captured images on a touch screen. The idea of controlling devices through a video image was introduced by Tani et al. [2] in the project Object Oriented Video. Users can manipulate digital functions through the captured video. ...
Conference Paper
Full-text available
As domestic robots and smart appliances become increasingly common, they require a simple, universal interface to control their motion. Such an interface must support a simple selection of a connected device, highlight its capabilities and allow for an intuitive manipulation. We propose "exTouch", an embodied spatially-aware approach to touch and control devices through an augmented reality mediated mobile interface. The "exTouch" system extends the users touchscreen interactions into the real world by enabling spatial control over the actuated object. When users touch a device shown in live video on the screen, they can change its position and orientation through multi-touch gestures or by physically moving the screen in relation to the controlled object. We demonstrate that the system can be used for applications such as an omnidirectional vehicle, a drone, and moving furniture for reconfigurable room.
... Moreover, in contrast to the VideoClix or iVast products [15][7] that support metadata editing of recorded video, our system works on live video and controls real-world devices. Live video as a reference cue for power-plant control has been explored in [13]. Goldberg et al. use video as a reference cue to convey " Tele-Actor, " a skilled human who collects the video and performs actions during navigation [6]. ...
... To operate devices (e.g. screen, speaker, or printer) appearing in live video, we need to model these devices with either a 3-D model or a 2-D model [13]. In the case of static objects and fixed FlySPEC cameras, it is easy to locate the objects the user is operating by simple " within-region " checking on the canvas. ...
Conference Paper
Full-text available
We present a system that allows remote and local participants to control devices in a meeting environment using mouse or pen based gestures "through" video windows. Unlike state-of-the-art device control interfaces that require interaction with text commands, buttons, or other artificial symbols, our approach allows users to interact with devices through live video of the environment. This naturally extends our video supported pan/tilt/zoom (PTZ) camera control system, by allowing gestures in video windows to control not only PTZ cameras, but also other devices visible in video images. For example, an authorized meeting participant can show a presentation on a screen by dragging the file on a personal laptop and dropping it on the video image of the presentation screen. This paper presents the system architecture, implementation tradeoffs, and various meeting control scenarios.
... In 1992, Tani et al. envisioned how users could interact with a real-world device located at a distance through live video [33]. Cameras observed industrial machinery and allowed users to manipulate mechanical switches and sliders over a distance by clicking and dragging within the live video image with a mouse. ...
... The aforementioned Hyperplant system allows users to control devices in a factory through a video image [33]. Liao et al. allowed users to annotate and drag-and-drop presentation slides between screens based on a video representation of the room [16]. ...
Conference Paper
Full-text available
In 1992, Tani et al. proposed remotely operating machines in a factory by manipulating a live video image on a com- puter screen. In this paper we revisit this metaphor and in- vestigate its suitability for mobile use. We present Touch Projector, a system that enables users to interact with re- mote screens through a live video image on their mobile device. The handheld device tracks itself with respect to the surrounding displays. Touch on the video image is "pro- jected" onto the target display in view, as if it had occurred there. This literal adaptation of Tani's idea, however, fails because handheld video does not offer enough stability and control to enable precise manipulation. We address this with a series of improvements, including zooming and freezing the video image. In a user study, participants se- lected targets and dragged targets between displays using the literal and three improved versions. We found that par- ticipants achieved highest performance with automatic zooming and temporary image freezing. Author Keywords Mobile device, input device, interaction techniques, multi- touch, augmented reality, multi-display environments.
... By interacting with such devices, users are not encumbered with headgear or special eyeglasses. This is a non-immersive " window-onthe-world " (WoW) style of augmented reality (see Milgram, 1994; Tani et al, 1992). ...
... With Pick-and-drop, the displays are discrete units, while with Hyperdragging, the displays are modeled as a single continuous surface. Interacting with objects at a remote site through live video is a technique explored in (Tani, 1992). Applying augmented reality for moving objects among displays via drag-and-drop is a feature of the EMMIE system (Butz et al, 1999). ...
Conference Paper
Full-text available
In a meeting room environment with multiple public wall displays and personal notebook computers, it is possible to design a highly interactive experience for manipulating and annotating slides. For the public displays, we present the ModSlideShow system with a discrete modular model for linking the displays into groups, along with a gestural interface for manipulating the flow of slides within a display group. For the applications on personal devices, an augmented reality widget with panoramic video supports interaction among the various displays. This widget is integrated into our NoteLook 3.0 application for annotating, capturing and beaming slides on pen-based notebook computers.
... An early attempt to teleoperate physical objects through live video is the work of Tani et al. in 1992 [12]. They used a common monitor-mouse-keyboard interface to manipulate the live video of real-world objects for remotely controlling an electric power plant, including clicking button images for controlling and dragging the 2D or 3D model of a physical object for positioning. ...
Article
Full-text available
This paper presents a framework for telepresence operation by touching live video on a touchscreen. Our goal is to enable users to use a smartpad to teleoperate everyday objects by touching the objects live video they are watching. To this end, we coined the term teleinteractive device to describe such an object with an identity, an actuator, and a communication network. We developed a touchable live video image-based user interface (TIUI) that empowers users to teleoperate any teleinteractive device by touching its live video with touchscreen gestures. The TIUI contains four modules touch, control, recognition, and knowledge to perform live video understanding, communication, and control for telepresence operation. We implemented a telepresence operation system that consists of a telepresence robot and teleinteractive devices at a local site, a smartpad with the TIUI at a remote site, and communication networks connecting the two sites. We demonstrated potential applications of the system in remotely controlling telepresence robots, opening doors with access control panels, and pushing power wheelchairs. We conducted user studies to show the effectiveness of the proposed framework.
... Ce système, baptisé Aspen Movie Map, permet l'exploration virtuelle et interactive de la ville de Colorado (Aspen), au moyen de cartes géographiques, de vidéos de la ville (rues, bâtiments, etc.) et de liens hypertextuels. Ce projet a inspiré le développement du système HyperPlant chez Hitachi Research Laboratory en 1992 [92] pour le contrôle à distance de machines d'une centrale électrique grâce à des vidéos prises en temps réelles. Tous les objets de la vue caméra peuvent être identifiés et manipulés, des contrôles virtuels peuvent être effectués sur eux. ...
Thesis
Full-text available
Le travail présenté s’intéresse à l’hypervidéo, un type particulier de documents hypermédias, qui combine la richesse de l’audiovisuel avec les capacités du multimédia et l’interactivité de l’hypermédia. Le manque de formalisme approprié pour la description de ce type de documents complexes est au cœur de la problématique traitée. Il est en effet souligné qu’au delà des défis technologiques que la vidéo - sous toutes ses formes - soulève, le manque de formalisation a grandement entravé les hypervidéos d’être examinées et utilisées à une plus grande échelle. La conceptualisation des hypervidéos n’ayant pas été largement examinée au-delà des modèles généraux de l’hypermédia, aucun modèle assez générique et peu restrictif n’est venu mettre les bases théoriques permettant l’émergence de l’hypervidéo comme domaine et comme base d’applications. Partant de ces constats, l’objectif du travail est d’aborder la définition formelle de l’hypervidéo par la proposition d’un modèle formel permettant de décrire et de représenter les hypervidéos, modèle devant être validé par un ensemble d’outils documentaires pour les hypervidéos sur le Web.
... In addition to simplified end-user programming tools, prior solutions introduced new interaction techniques to manage the complexity of IoT devices. Previous projects enabled users to manipulate smart environments through Brain-Computer Interfaces [10,17], eye trackers [5], projectors [20,54], large interactive displays [29,55], live videos on a computer [62], and in-air gestures [11,31]. However, most of these methods require heavy instrumentation of environments. ...
Conference Paper
Full-text available
Knocking is a way of interacting with everyday objects. We introduce BeatIt, a novel technique that allows users to use passive, everyday objects to control a smart environment by recognizing the sounds generated from knocking on the objects. BeatIt uses a BeatSet, a series of percussive sound samples, to represent the sound signature of knocking on an object. A user associates a BeatSet with an event. For example, a user can associate the BeatSet of knocking on a door with the event of turning on the lights. Decoder, a signal-processing module, classifies the sound signals into one of the recorded BeatSets, and then triggers the associated event. Unlike prior work, BeatIt can be implemented on microphone-enabled commodity devices. Our user studies with 12 participants showed that our proof-of-concept implementation based on a smartwatch could accurately classify eight BeatSets using a user-independent classifier.
... AR is also well-suited for visual support and feedback during control, manipulation and actuation of devices and objects. Tani et al. [199] describe a user interface for manipulating physical controls on remote machinery through an augmented video interface. TouchMe [64] applies direct-manipulation techniques for remote robot control using video see-through AR and a touch-screen interface. ...
Thesis
Full-text available
The vision to interact with computers through our whole body - to not only visually perceive information, but to engage with it through multiple senses has inspired human computer interaction (HCI) research for decades. Shape displays address this challenge by rendering dynamic physical shapes through computer controlled, actuated surfaces that users can view from different angles and touch with their hands to experience digital models, express their ideas and collaborate with each other. Similar to kinetic sculptures, shape displays do not just occupy, rather they redefine the physical space around them. By dynamically transforming their surface geometry, they directly push against hands and objects, yet they also form a perceptual connection with the users gestures and body movements at a distance. Based on this principle of spatial continuity, this thesis introduces a set of interaction techniques that move between touching the interface surface, to interacting with tangible objects on top, and to engaging through gestures in relation to it. These techniques are implemented on custom-built shape display systems that integrate physical rendering, synchronized visual display, shape sensing, and spatial tracking. On top of this hardware platform, applications for computer-aided design, urban planning, and volumetric data exploration allow users to manipulate data at different scales and modalities. To support remote collaboration, shared telepresence workspaces capture and remotely render the physical shapes of people and objects. Users can modify shared models, and handle remote objects, while augmenting their capabilities through altered remote body representations. The insights gained from building these prototype workspaces and from gathering user feedback point towards a future in which computationally transforming materials will enable new types of bodily, spatial interaction with computers.
... Several works have focused on pushing and pulling content using touch-surfaces as proxies to public displays. Touch Projector [5] demonstrated improvements to work by Tani et al. that enabled the control of remote machinery through live video feeds while maintaining spatial context [23]. Boring's work made use of a phone camera feed to project touches on to public displays to manipulate and move objects. ...
Conference Paper
Previous work has validated the eyes and mobile input as a viable approach for pointing at, and selecting out of reach objects. This work presents Eye Pull, Eye Push, a novel interaction concept for content transfer between public and personal devices using gaze and touch. We present three techniques that enable this interaction: Eye Cut & Paste, Eye Drag & Drop, and Eye Summon & Cast. We outline and discuss several scenarios in which these techniques can be used. In a user study we found that participants responded well to the visual feedback provided by Eye Drag & Drop during object movement. In contrast, we found that although Eye Summon & Cast significantly improved performance, participants had difficulty coordinating their hands and eyes during interaction. © 2013 IFIP International Federation for Information Processing.
... However unlike these, SuperVision does not need a tagged environment. Other interesting approaches include object control from a video on a computer screen [13] or a table top display [5]. However, these approaches tend to divide the user's attention from the controlled object to the controlling device. ...
Conference Paper
Full-text available
In this paper, we propose SuperVision, a new interaction technique for distant control of objects in a smart home. This technique aims at enabling users to point towards an object, visualize its current state and select a desired functionality as well. To achieve this: 1) we present a new remote control that contains a pico-projector and a slider; 2) we introduce a visualization technique that allows users to locate and control objects kept in adjacent rooms, by using their spatial memories. We further present a few example applications that convey the possibilities of this technique.
... An early attempt to use live videos for interacting with a real object located at a distance stems from Tani et al. [39] in 1992. They presented interactive video techniques to implement a system for monitoring and controlling an electric power plant. ...
Article
This paper presents a telepresence interaction framework and system based on touch screen and telepresence robot technologies. The system is composed of a telepresence robot and tele-interactive devices in a remote environment (presence space), the touching live video image user interface (TIUI) used by an operator (user) in an operation space, and wireless network connecting the two spaces. A tele-interactive device refers to a real object with its identification, actuator, and Wireless communication. A telepresence robot is used as the embodiment of an operator to go around in the presence space to actively capture live videos. The TIUI is our new user interface which allows an operator simply uses a pad to access the system anywhere to not only remotely operate the telepresence robot but also interact with a tele-interactive device, just by directly touching its live video image as if him/her to do it in the presence. The preliminary evaluation and demonstration show the efficacy and promising of our framework and system.
... Tani et al. presented interactive video techniques that allow interaction with objects in live video on the screen, by having models of the objects monitored by cameras [1]. They explored two strategies for modeling objects imaged by cameras in 2D and 3D. ...
Article
A general remote controlled robot is manipulated by a joystick and a gamepad. However, these methods are difficult for inexperienced users because the mapping between the user input and resulting robot motion is not always intuitive (e.g. tilt a joystick to the right to rotate the robot to the left). To solve this problem, we propose a touch-based interface for remotely controlling a robot from a third-person view, which is called "TouchMe". This system allows the user to manipulate each part of the robot by directly touching it on a view of the world as seen by a camera looking at the robot from a third-person view. Our system provides intuitive operation, and the user can use our system with minimal user training. In this paper we describe the TouchMe interaction and its prototype implementation. We also introduce three scheduling methods for controlling the robot in response to user interaction and report on the results of empirical comparisons of these methods.
... AR is also well-suited for visual support and feedback during control, manipulation and actuation of devices and objects. Tani et al. [31] describe a user interface for manipulating physical controls on remote machinery through an augmented video interface. TouchMe [10] applies direct-manipulation techniques for remote robot control using video see-through AR and a touch-screen interface. ...
Article
Full-text available
Recent research in 3D user interfaces pushes towards immersive graphics and actuated shape displays. Our work explores the hybrid of these directions, and we introduce sublimation and deposition, as metaphors for the transitions between physical and virtual states. We discuss how digital models, handles and controls can be interacted with as virtual 3D graphics or dynamic physical shapes, and how user interfaces can rapidly and fluidly switch between those representations. To explore this space, we developed two systems that integrate actuated shape displays and augmented reality (AR) for co-located physical shapes and 3D graphics. Our spatial optical see-through display provides a single user with head-tracked stereoscopic augmentation, whereas our handheld devices enable multi-user interaction through video seethrough AR. We describe interaction techniques and applications that explore 3D interaction for these new modalities. We conclude by discussing the results from a user study that show how freehand interaction with physical shape displays and co-located graphics can outperform wand-based interaction with virtual 3D graphics.
... The device projects the touch input onto the target display, which acts as if it had occurred on itself" [3] . The touch projector system is based on the vision that users could remotely interact with devices via live video [25]. The basic functionality of the touch projector enables screens without touch capability to receive touch capability through the smartphone (see figure 3). ...
... Back in the 1980s, an early approach to interactive video used videotape recorders, audio track tones to control it and light flashes as a feedback mechanism, introducing a basic interactive experience for learning activities [4]. In 1992, Tani et al. [5] presented an interactive video technique that allowed interacting with a graphics layer that interfaces physical objects being displayed in live video, allowing users to remotely control the movement of those objects. ...
Conference Paper
Full-text available
This paper presents an interactive video system that enables users to change the flow of video playback by interacting with hotspots that were predefined throughout the video streams. These hotspots are synchronized with the underlying video streams and the interactions result in smooth transitions between the preloaded targets. This approach allows the dynamic visualization of content by interacting with the hotspots and producing the consequent changes in the flow of the story. The system includes web-based and mobile video players specifically developed to deal with the interactive features, as well as a configuration tool that allows content managers to choose which pre-produced interaction possibilities will be used for a specific target audience. The interactive video solution presented herein has potential to be used as a powerful communication tool, in commercial, e-learning, accessibility and entertainment contexts.
... It explores ways in which computing can be utilized to manage this complex data, e.g. through summarization [6,10,24] and browsing techniques [11,18]. There is also a parallel concern with live video and streaming video, in techniques for mediated talk [21,27], interaction with objects in the physical world through live video [25], remote video collaboration [16], and in workplace training and remote education [3]. On top of the research concerns, we see a growing interest in user generated video on the internet, which is now moving beyond sharing video files [5] to live streaming or live broadcasts from mobile devices (e.g. ...
Article
Full-text available
In this paper we explore the production of streaming media that involves live and recorded content. To examine this, we report on how the production practices and process are conducted through an empirical study of the production of live television, involving the use of live and non-live media under highly time critical conditions. In explaining how this process is managed both as an individual and collective activity, we develop the concept of temporal hybridity to explain the properties of these kinds of production system and show how temporally separated media are used, understood and coordinated. Our analysis is examined in the light of recent developments in computing technology and we present some design implications to support amateur video production.
... By interacting with an image, users are not encumbered with headgear or special eyeglasses. This is a non-immersive "window-onthe-world" (WoW) style of augmented reality [1]. Figure 1 shows some of the interaction possibilities available with RTS (though we have not fully implemented all of them). ...
Article
Full-text available
For some years, our group at FX Palo Alto Laboratory has been developing technologies to support meeting recording, collaboration, and videoconferencing. This paper presents a few of our more interesting research directions. Many of our systems use a video image as an interface, allowing devices and information to be accessed "through the screen." For example, SPEC enables hybrid collaborative and automatic camera control through an active video window. The NoteLook system allows a user to grab an image from a computer display, annotate it with digital ink, then drag it to that or a different display, while automatically generating timestamps for later video review. The ePIC system allows natural use and control of multi-display and multi-device presentation spaces, and the iLight system allows remote users to "draw" with light on a local object. All our systems serve as platforms for researching more sophisticated algorithms that will hopefully support additional advanced functions and ease of use.
... Live video as a reference cue for power-plant control has been explored in [10]. Goldberg et al. use video as a reference cue to convey "Tele-Actor," a skilled human who collects the video and performs actions during navigation [20]. ...
Article
Full-text available
This paper summarizes our environment-image/video- supported collaboration technologies developed in the past several years. These technologies use environment images and videos as active interfaces and use visual cues in these images and videos to orient device controls, annotations and other information access. By using visual cues in various interfaces, we expect to make the control interface more intuitive than button- based control interfaces and command-based interfaces. These technologies can be used to facilitate high-quality audio/video capture with limited cameras and microphones. They can also facilitate multi-screen presentation authoring and playback, tele- interaction, environment manipulation with cell phones, and environment manipulation with digital pens. Collaboration support; control through image; control through video; control with a cell phone; control with a digital pen; remote control; collaborative and automatic camera control; tele- interaction; presentation authoring; device control; gesture based camera control; video production; video communication; video conferencing; webcams; collaborative device control; distance learning; interactive image/video.
... "window-on-the-world" (WoW) displays -upon which computer generated images are electronically or digitally overlaid (e.g. Metzger, 1993;Milgram et al, 1991;Rosenberg, 1993;Tani et al, 1992). Although the technology for accomplishing such combinations has been around for some time, most notably by means of chroma-keying, practical considerations compel us to be interested particularly in systems in which this is done stereoscopically (e.g. ...
Article
Full-text available
Mixed Reality (MR) visual displays, a particular subset of Virtual Reality (VR) related technologies, involve the merging of real and virtual worlds somewhere along the 'virtuality continuum' which connects completely real environments to completely virtual ones. Augmented Reality (AR), probably the best known of these, refers to all cases in which the display of an otherwise real environment is augmented by means of virtual (computer graphic) objects. The converse case on the virtuality continuum is therefore Augmented Virtuality (AV). Six classes of hybrid MR display environments are identified. However quite different groupings are possible and this demonstrates the need for an efficient taxonomy, or classification framework, according to which essential differences can be identified. An approximately three-dimensional taxonomy is proposed comprising the following dimensions: extent of world knowledge, reproduction fidelity, and extent of presence metaphor.
... To control the devices in the living room, users can directly manipulate them by touching the corresponding video-image (cf. [2]). Depending on the controlled device different types of input are possible. ...
Conference Paper
Full-text available
The amount of digital appliances and media found in domestic environments has risen drastically over the last decade, for example, digital TVs, DVD and Blu-ray players, digital picture frames, digital gaming systems, electronically moveable window blinds, and robotic vacuum cleaners. As these devices become more compatible to Internet and wireless networking (e.g. Internet-ready TVs, streaming digital picture frames, and WiFi gaming systems, such as Nintendo's Wii and Sony's Playstation) and as networking and WiFi home infrastructures become more prevalent, new opportunities arise for developing centralized control of these myriad devices and media into so called "Universal remote controls". However, many remote controls lack intuitive interfaces for mapping control functions to the device intended being controlled. This often results in trial and error button pressing, or experimentation with graphical user interface (GUI) controls, before a user achieves their intended action.
... We use the term monitor-based (non-immersive), or "window-on-the-world" (WoW), AR to refer to display systems where computer generated images are either analogically or digitally overlaid onto live or stored video images. 12,13,14 Although the technology for achieving this has been well-known for some time, most notably by means of chroma-keying, a large number of useful applications present themselves when this concept is implemented stereoscopically. 15,16,17 In our own laboratory this class of monitor-based AR displays has been under development for some years, as part of the ARGOS (Augmented Reality through Graphic Overlays on Stereovideo) project. ...
Article
Full-text available
In this paper we discuss Augmented Reality (AR) displays in a general sense, within the context of a Reality-Virtuality (RV) continuum, encompassing a large class of "Mixed Reality" (MR) displays, which also includes Augmented Virtuality (AV). MR displays are defined by means of seven examples of existing display concepts in which real objects and virtual objects are juxtaposed. Essential factors which distinguish different Mixed Reality display systems from each other are presented, first by means of a table in which the nature of the underlying scene, how it is viewed, and the observer's reference to it are compared, and then by means of a three dimensional taxonomic framework, comprising: Extent of World Knowledge (EWK), Reproduction Fidelity (RF) and Extent of Presence Metaphor (EPM). A principal objective of the taxonomy is to clarify terminology issues and to provide a framework for classifying research across different disciplines.
... Live video can be used to instruct a task to a robot. A user can manipulate a robot interactively in video by operating controls seen on the robot [17]. A user can specify where and what a robot should do in the environment using video from a ceiling-mounted camera [18][19]. ...
Conference Paper
We present a method of instructing a sequential task to a household robot using a hand-held augmented reality device. The user decomposes a high-level goal such as “prepare a drink” into steps such as delivering a mug under a kettle and pouring hot water into the mug. The user takes a photograph of each step using the device and annotates it with necessary information via touch operation. The resulting sequence of annotated photographs serves as a reference for review and reuse at a later time. We created a working prototype system with various types of robots and appliances.
... The concept of controlling devices through a smaller image was first presented in the world-in-a-miniature interface [11]. The idea of controlling devices through a video image was introduced by Tani et al. [12] in the project Hyperplant. They explored different ways of controlling devices in a factory through a video image. ...
Conference Paper
Full-text available
While most homes are inherently social places, existing devices designed to control consumer electronics typically only support single user interaction. Further, as the number of consumer electronics in modern homes increases, people are often forced to switch between many controllers to interact with these devices. To simplify interaction with these devices and to enable more collaborative forms of device control, we propose an integrated remote control system, called CRISTAL (Control of Remotely Interfaced Systems using Touch-based Actions in Living spaces). CRISTAL enables people to control a wide variety of digital devices from a centralized, interactive tabletop system that provides an intuitive, gesture-based interface that enables multiple users to control home media devices through a virtually augmented video image of the surrounding environment. A preliminary user study of the CRISTAL system is presented, along with a discussion of future research directions.
... It explores ways in which computing can be utilized to manage this complex data, e.g. through summarization [6,10,24] and browsing techniques [11,18]. There is also a parallel concern with live video and streaming video, in techniques for mediated talk [21,27], interaction with objects in the physical world through live video [25], remote video collaboration [16] , and in workplace training and remote education [3]. On top of the research concerns, we see a growing interest in user generated video on the internet, which is now moving beyond sharing video files [5] to live streaming or live broadcasts from mobile devices (e.g. ...
... Parts of our work were inspired by Rukzios [4] and Rohs [3], the latter of which also wrote the visual marker recognition we used. Using an augmented video stream to observe and control objects was introduced by Tani et al. [6]. This also inspired a very recent system called Touch Projector by Boring et al. [1]. ...
Conference Paper
In this work we present a method to intuitively issue control over devices in smart environments, to display data that smart objects and sensors provide, and to create and manipulate flows of information in smart environments. This makes it easy to customize smart environments by linking arbitrary data sources to various display modalities on the fly. Touchscreen smartphones - as readily available multi-purpose devices - are used to overlay real objects with virtual controls. We evaluated this system with a first qualitative user study.
Conference Paper
Full-text available
We propose WaddleWalls, a room-scale interactive partitioning system using a swarm of robotic partitions that allows occupants to interactively reconfigure workspace partitions to satisfy their privacy and interaction needs. The system can automatically arrange the partitions’ layout designed by the user on demand. The user specifies the target partition’s position, orientation, and height using the controller’s 3D manipulations. In this work, we discuss the design considerations of the interactive partition system and implement WaddleWalls’ proof-of-concept prototype assembled with off-the-shelf materials. We demonstrate the functionalities of WaddleWalls through several application scenarios in an open-planned office environment. We also conduct an initial user evaluation that compares WaddleWalls with conventional wheeled partitions, finding that WaddleWalls allows effective workspace partitioning and mitigates the physical and temporal efforts needed to fulfill ad hoc social and privacy requirements. Finally, we clarify the feasibility, potential, and future challenges of WaddleWalls through an interview with experts.
Article
Amount of data for operators is increasing, but the amount of information, which operators obtain is not increasing because conventional human interfaces only focus on human foreground awareness. Users usually obtain information with both of background and foreground awareness. This paper proposes a new human interface style (awareness oriented human interface), which exploits the human background awareness. The human interface promotes the operator's abilities of recognition by, 1) providing information for both of foreground and background awareness simultaneously, 2) information navigation between foreground and background awareness, 3) awareness oriented information processing, 4) supporting public awareness for multiple operators.
Conference Paper
We present a Capture the Flag based game that investigates the possible engagements in a multi-device game.The distinction between a publicly used space and a player's private space is made and utilized to display different information to players. The tablet and the Augmented Reality component are used to see how players can be drawn to a certain physical space, to create a social and engaging game. The demo allows the users to experience a different setup for a multi-device game that attempts to engage the users with the space and each other.
Conference Paper
We present a Capture the Flag based game that investigates the possible engagements in a multi-device game. The distinction between a publicly used space and a player's private space is made and utilized to display different information to players. The tablet and the Augmented Reality component are used to see how players can be drawn to a certain physical space, to create a social and engaging game.
Conference Paper
In this paper, we propose a way to enable users to preview a modified version of objects in the real world with a mobile device's screen using techniques of augmented reality with live video. Here, we applied the methodology to develop a prototype system and an interface that enables users to modify fonts of designs of a poster put in an actual environment and preview it to reduce the problem referred to as "impression inconsistency." From another point of view, this system uses an "interaction through video" metaphor. Tani et al. devised a technique to remotely operate machines by manipulating a live video image on a computer screen. Boring et al. applied it to distant large displays and mobile devices. Our system provides interaction with static, unintelligent targets such as posters and signs through live video.
Conference Paper
Remote controls facilitate interactions at-a-distance with appliances. However, the complexity, diversity, and increasing number of digital appliances in ubiquitous computing ecologies make it increasingly difficult to: (1) discover which appliances are controllable; (2) select a particular appliance from the large number available; (3) view information about its status; and (4) control the appliance in a pertinent manner. To mitigate these problems we contribute proxemic-aware controls, which exploit the spatial relationships between a person's handheld device and all surrounding appliances to create a dynamic appliance control interface. Specifically, a person can discover and select an appliance by the way one orients a mobile device around the room, and then progressively view the appliance's status and control its features in increasing detail by simply moving towards it. We illustrate proxemic-aware controls of assorted appliances through various scenarios. We then provide a generalized conceptual framework that informs future designs of proxemic-aware controls.
Chapter
Today's smartphones provide the technical means to serve as interfaces for public displays in various ways. Even though recent research has identified several approaches for mobile-display interaction, inter-technique comparisons of respective methods are scarce. In this chapter, the authors present an experimental user study on four currently relevant mobile-display interaction techniques ('Touchpad', 'Pointer', 'Mini Video', and 'Smart Lens'). The results indicate that mobile-display interactions based on a traditional touchpad metaphor are time-consuming but highly accurate in standard target acquisition tasks. The direct interaction techniques Mini Video and Smart Lens had comparably good completion times, and especially Mini Video appeared to be best suited for complex visual manipulation tasks like drawing. Smartphone-based pointing turned out to be generally inferior to the other alternatives. Finally, the authors introduce state-of-the-art browser-based remote controls as one promising way towards more serendipitous mobile interactions and outline future research directions.
Article
General remote-control robots are manipulated by joysticks or game pads. These are difficult for inexperienced users, however, because the relationship between user input and the resulting robot movement may not be intuitive, e.g., tilting the joystick to the right to rotate the robot left. To solve this problem, we propose a touch-based interface called TouchMe for controlling a robot remotely from a third-person point of view. This interface allows the user to directly manipulate individual parts of a robot by touching it as seen by a camera. Our system provides intuitive operation allowing the user to use it with minimal training. In this paper, we describe TouchMe interaction and prototype implementation. We also introduce three types of movement for controlling the robot in response to user interaction and report on results of an empirical comparison of these methods.
Article
The huge influx of mobile display devices is transforming computing into multi-device interaction, demanding a fluid mechanism for using multiple devices in synergy. In this paper, we present a novel interaction system that allows a collocated large display and a small handheld device to work together. The smartphone acts as a physical interface for near-surface interactions on a computer screen. Our system enables accurate position tracking of a smartphone placed on or over any screen by displaying a 2D color pattern that is captured using the smartphone's back-facing camera. As a result, the smartphone can directly interact with data displayed on the host computer, with precisely aligned visual feedback from both devices. The possible interactions are described and classified in a framework, which we exemplify on the basis of several implemented applications. Finally, we present a technical evaluation and describe how our system is unique compared to other existing near-surface interaction systems. The proposed technique can be implemented on existing devices without the need for additional hardware, promising immediate integration into existing systems.
Article
This thesis systematizes the previously ad hoc specification of targets and motion commands for visual deictic control of mobile robots, which enables control of mobile robots in the real world without requiring foreknowledge of the environment. Whereas ad hoc (or generic) visual targets may be scattered densely enough in the world to support deictically controlled navigation, and whereas environment-independent motion commands may suffice in open enough spaces, real environments in which robots would be useful cannot in general be relied upon to be sufficiently serendipitous. Instead, this thesis abstracts from the structure of the world a well-defined set of canonical targets and motion commands relative to those targets which together support general-purpose navigation. Canonical targets are a superset of those identified as critical by a visibility graph analysis. Targets and commands are also defined to position a robot at its goal location for the performance of whatever task it ...
Article
Today’s smartphones provide the technical means to serve as interfaces for public displays in various ways. Even though recent research has identified several new approaches for mobile-display interaction, inter-technique comparisons of respective methods are scarce. The authors conducted an experimental user study on four currently relevant mobile-display interaction techniques (‘Touchpad’, ‘Pointer’, ‘Mini Video’, and ‘Smart Lens’) and learned that their suitability strongly depends on the task and use case at hand. The study results indicate that mobile-display interactions based on a traditional touchpad metaphor are time-consuming but highly accurate in standard target acquisition tasks. The direct interaction techniques Mini Video and Smart Lens had comparably good completion times, and especially Mini Video appeared to be best suited for complex visual manipulation tasks like drawing. Smartphone-based pointing turned out to be generally inferior to the other alternatives. Examples for the application of these differentiated results to real-world use cases are provided.
Conference Paper
The Smarter Objects system explores a new method for interaction with everyday objects. The system associates a virtual object with every physical object to support an easy means of modifying the interface and the behavior of that physical object as well as its interactions with other "smarter objects". As a user points a smart phone or tablet at a physical object, an augmented reality (AR) application recognizes the object and offers an intuitive graphical interface to program the object's behavior and interactions with other objects. Once reprogrammed, the Smarter Object can then be operated with a simple tangible interface (such as knobs, buttons, etc). As such Smarter Objects combine the adaptability of digital objects with the simple tangible interface of a physical object. We have implemented several Smarter Objects and usage scenarios demonstrating the potential of this approach.
Conference Paper
Graphical User Interfaces (GUI) offers a very flexible interface but require the user's complete visual attention, whereby Tangible User Interfaces (TUI) can be operated with minimal visual attention. To prevent visual overload and provide a flexible yet intuitive user interface, Smarter Objects combines the best of both styles by using a GUI for understanding and programming and a TUI for day to day operation. Smarter Objects uses Augmented Reality technology to provide a flexible GUI for objects when that is needed.
Conference Paper
Remote control robots are being found in an increasing number of application domains, including search and rescue, exploration, and reconnaissance. There is a large body of HRI research that investigates interface design for remote navigation, control, and sensor monitoring, while aiming for interface enhancements that benefit the remote operator such as improving ease of use, reducing operator mental load, and maximizing awareness of a robot's state and remote environment. Even though many remote control robots have multi-degree-of-freedom robotic manipulator arms for interacting with the environment, there is only limited research into easy-to-use remote control interfaces for such manipulators, and many commercial robotic products are still using simplistic interface technologies such as keypads or gamepads with arbitrary mappings to arm morphology. In this paper, we present an original interface for the remote control of a multi-degree of freedom robotic arm. We conducted a controlled experiment to compare our interface to an existing commercial keypad interface and detail our results that indicate our interface was easier to use, required less cognitive task load, and enabled people to complete tasks more quickly. In this paper, we present an original interface for the remote control of a multi-degree of freedom robotic arm. We conducted a controlled experiment to compare our interface to an existing commercial keypad interface and detail our results that indicate our interface was easier to use, required less cognitive task load, and enabled people to complete tasks more quickly.
Article
Some of the most challenging multimedia applications have involved real- time conferencing, using audio and video to support interpersonal communication. Here we re-examine assumptions about the role, importance and implementation of video information in such systems. Rather than focussing on novel technologies, we present evaluation data relevant to both the classes of real-time multimedia applications we should develop and their design and implementation. Evaluations of videoconferencing systems show that previous work has overestimated the importance of video at the expense of audio. This has strong implications for the implementation of bandwidth allocation and synchronization. Furthermore our recent studies of workplace interaction show that prior work has neglected another potentially vital function of visual information: in assessing the communication availability of others. In this new class of application, rather than providing a supplement to audio information, visual information is used to promote the opportunistic communications that are prevalent in face-to-face settings. We discuss early experiments with such connection applications and identify outstanding design and implementation issues. Finally we examine a different class of application 'video-as-data', where the video image is used to transmit information about the work objects themselves, rather than information about interactants.
Article
Describes the development of a computer-based walkthrough system produced from video-still images. The navigable simulator, Archiwalk, is designed to familiarize students in built environment disciplines with a range of building types and examples, providing an experiential learning environment. Users are able to manoeuvre around using on-screen controls, encouraging an exploratory form of interaction. Reports on various stages of the system’s development, and includes some of the design issues which were addressed. Also describes a formal method for capturing and compiling images to enable teachers to compile their own walkthrough case studies; this is facilitated by the use of database creator software.
Article
In hyperlinked video, objects are selectable—resulting in an associated action—akin to linked words or graphics on Web pages. Possible venues for hyperlinked video include broadcast television, streaming video (e.g., video on the Internet or other forms of video-on-demand systems), and published media such as DVD (digital versatile disc). Hyperlinked video offers new interaction possibilities with streaming media and the opportunity to move electronic commerce from the personal computer to the television receiver while making it more dynamic and engaging. In this paper we examine some opportunities and challenges of hyperlinked video, describe an object tracking and identification algorithm and an authoring tool we have developed, and discuss two prototype hyperlinked television programs—one an augmented broadcast and one a video-on-demand application.
Article
The increasing use of multimedia systems has led to a need for retrieval mechanisms not just for static media (such as text and graphics) but also for the ‘new’ dynamic media (time-variant media such as audio and video). All these media are indexable by physical attributes, but as yet generally only text and graphics are indexable to a further level of granularity by content. This paper explores the issues associated with attaching application-dependent semantics to objects which occur visually (or otherwise) within video sequences. The principles discussed here make particular reference to video, but they should be general enough to be applicable to other dynamic media such as audio, or even smell and tactility. Thus, they describe an extension of the anchor concept common in hypermedia models towards a more general mechanism, applicable to a wide range of media. The paper first describes a general concept for accessing the contents of dynamic media by considering two points of view—those of hypermedia and information retrieval. The paper then introduces the general concept of Sensitive Regions (or ‘hot-spots’) by reverse engineering techniques from established technologies such as computer graphics and traditional cinematic animation. In the final sections, the paper describes three applications being developed at the Computer Graphics Center (ZGDV) which explore a variety of aspects associated with Sensitive Regions: the HyperPicture-System focuses on the management of such data, MOVie experiments with the creation and editing of Sensitive Regions in a cinematically oriented context, while ShareME explores some of the issues associated with the use of Sensitive Regions in the interface to multimedia applications.
Conference Paper
Full-text available
The increasing number of media facades in urban spaces offers great potential for new forms of interaction especially for collaborative multi-user scenarios. In this paper, we present a way to directly interact with them through live video on mobile devices. We extend the Touch Projector interface to accommodate multiple users by showing individual content on the mobile display that would otherwise clutter the facade's canvas or distract other users. To demonstrate our concept, we built two collaborative multi-user applications: (1) painting on the facade and (2) solving a 15-puzzle. We gathered informal feedback during the ARS Electronica Festival in Linz, Austria and found that our interaction technique is (1) considered easy-to-learn, but (2) may leave users unaware of the actions of others.
Conference Paper
Full-text available
Studies of video as a support for collaborative work have provided little hard evidence of its utility for either task performance or fostering telepresence, i.e. the conveyance of a face-to-face like social presence for remotely located participants. To date, most research on the value of video has concentrated on “talking heads” video in which the video images are of remote participants conferring or performing some task together. In contrast to talking heads video, we studied video-as-data in which video images of the workspace and work objects are the focus of interest, and convey critical information about the work. The use of video-as-data is intended to enhance task performance, rather than to provide telepresence. We studied the use of video during neurosurgery within the operating room and at remote locations away from the operating room. The workspace shown in the video is the surgical field (brain or spine) that the surgeon is operating on. We discuss our findings on the use of live and recorded video, and suggest extensions to video-as-data including its integration with computerized time-based information sources to educate and co-ordinate complex actions among distributed workgroups.
Conference Paper
Our living and work spaces are becoming ever more enriched with all kinds of electronic devices. Many of these are too small to provide the possibility to control or monitor them. Ambient intelligence is integrating many such devices in what are called smart environments to form a network of interweaved sensors, data displays and everyday devices. We present a method to intuitively issue control over smart objects in such an environment, to display data that smart objects provide and to manage the flow of information between objects in a smart environment. This is achieved by using touch-enabled mobile phones as readily available multi-purpose devices which are used to overlay real objects with virtual controls. We evaluated the system with a first qualitative user study.
Conference Paper
This paper proposes an interaction for controlling multimedia contents of remote devices using a mobile device based on AR technology. Using a real-time object recognition method, home devices detected by the camera of a mobile device are displayed on the camera preview screen along with the thumbnails of their own multimedia contents around the recognized positions. A user may drag a multimedia content which he or she wants to play, and drop it onto another target home device which he or she wants to play the content through. The user study showed that the proposed interaction expects higher usability since once a home device is registered with its device name when registering its image shown on the mobile camera for the object recognition, this matching process is no longer necessary when a user controls the device through the mobile device.
Article
A description is given of a human factors analysis carried out on the man-machine interface (MMI) needs of energy management systems (EMSs), with focus on the applications of full-graphics display technology for these systems. The need for and purpose of human-factors analysis are discussed. Conclusions are presented regarding switching operations, screen management, automatic generation control, unit commitment, and security analysis. The results of the study demonstrate that advanced features such as world map, pan, zoom, and declutter applied to SCADA, network diagrams, and tabular applications displays enhance operator effectiveness. While some examples are drawn from an actual implementation, this is not a description of a specific MMI or a design description.< >
The human interface. Lifetime Learning Publications
  • R A Bolt
Readings in computer vision
  • M A Fischler
  • O Firschein
Object oriented methods for graphicsObject and constraint paradigms for graphics
  • C Laffra
Interactive video bolsters
  • T Mcmillan
Full graphic operator's consoles for power control center applications
  • G A Seyfert
  • K C Liu