Content uploaded by Hiroyuki Shinoda
Author content
All content in this area was uploaded by Hiroyuki Shinoda on May 18, 2020
Content may be subject to copyright.
HaptoClone (Haptic-Optical Clone)
for Mutual Tele-Environment by Real-time 3D Image
Transfer with Midair Force Feedback
Yasutoshi Makino, Yoshikazu Furuyama, Seki Inoue and Hiroyuki Shinoda
The University of Tokyo
5-1-5, Kashiwano-ha, Kashiwa-shi, Chiba, 277-8561, Japan
{yasutoshi_makino, hiroyuki_shinoda}@k.u-tokyo.ac.jp, {furuyama, inoue}@hapis.k.u-tokyo.ac.jp
Figure 1. (a) Two people can communicate by using cloned images with tactile feedback.
(b) Each side can see and touch the other side’s volumetric image in real-time. (c) Human can interact with a remote object.
ABSTRACT
In this paper, we propose a novel interactive system that
mutually copies adjacent 3D environments optically and
physically. The system realizes mutual user interactions
through haptics without wearing any devices. A realistic
volumetric image is displayed using a pair of micro-mirror
array plates (MMAPs). The MMAP transmissively reflects
the rays from an object, and a pair of them reconstructs the
floating aerial image of the object. Our system can optically
copy adjacent environments based on this technology. Haptic
feedback is also given by using an airborne ultrasound tactile
display (AUTD). Converged ultrasound can give force
feedback in midair. Based on the optical characteristics of
the MMAPs, the cloned image and the user share an identical
coordinate system. When a user touches the transferred clone
image, the system gives force feedback so that the user can
feel the mechanical contact and reality of the floating image.
Author Keywords
3D interaction; Tactile display; Telexistence; Mutual
communication
ACM Classification Keywords
H.5.1. Information interfaces and presentation (e.g., HCI):
Multimedia Information Systems: Artificial, augmented, and
virtual realities.
INTRODUCTION
In this paper, we propose a new interaction system named
“HaptoClone (Haptic-Optical Clone)” that enables real-time
interaction with floating volumetric images with haptic
feedback. Two users sitting side by side can interact with
each other’s cloned image through physical force. This type
of interaction has been proposed as “telexistence” or
“telepresence” in many previous studies. Our system has the
following two advantages compared with previous systems:
1) Users can mutually interact with each other through
tactile feedback without wearing any equipment in 3D
space.
2) Not only people but also objects can physically interact
to the remote area as well.
In general, telexistence, especially with haptics, is realized
by a pair of “master” and “slave” devices [6]. A user at the
master side wears equipment, and his or her motion is
transferred to a remotely connected slave device. The slave
device measures forces in the interaction as well as vision
and sound to send them back to the master side. With this
configuration, the telexistence system is not mutual. The
master side can feel remote environmental information,
while people at the slave side cannot know what is happening
at the master side. There is another telepresence system
named Physical Telepresence [24]. People can interact with
the remote object through a system that uses many moving
Permission to make digital or hard copies of all or part of t his work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components o f this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or repu blish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CHI'16, May 07-12, 2016, San Jose, CA, USA
© 2016 ACM. ISBN 978-1-4503-3362-7/16/05$15.00
DOI: http://dx.doi.org/10.1145/2858036.2858481
[Draft] CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
May 2016 Pages 1980‒1990, https://doi.org/10.1145/2858036.2858481
pins to realize physical interaction. The user’s body on one
side can control the remote object through actuators. In this
system, a remote object can be displayed in a 2D workspace.
Our system uses ultrasound to produce an invisible high-
pressure area in the air to realize physical interaction
superimposed on visual clones in 3D space. Although the
produced force by ultrasound is limited, users can interact
with each other without wearing any devices. In addition,
objects can physically interact with each other. Previously
proposed “telexistence” or “telepresence” systems focused
on how to transfer the existence of people to the remote
environment. By contrast, our system can clone not only
people but also their surrounding 3D environment. We
named this the “mutual tele-environment” system. This is the
most significant difference in our system compared with
many previous similar systems.
Figure 1 (a) shows the prototype system and how to use it
with two people sitting adjacent to each other. A pair of
micro-mirror array plates (MMAPs) optically clone each
user’s images to the other side, as shown in Figure 1 (b). The
girl and her hand in the picture is an optically cloned image
floating in the air. Since the system uses a mirror array to
produce a floating image, it moves in real-time. The image is
transferred at the speed of light. Based on the characteristics
of the MMAP, whose principle is given in the PROTOTYPE
section, the real side and that of the cloned image share an
identical coordinate system. When a user at side A touches
the cloned floating image of side B, then at the same time,
the cloned floating image of side A that appears at side B also
touches the real object at side B. Moreover, when two people
face each other through the system, they can make eye
contact because we use a mirror system. The system enables
users communicate in natural ways with floating 3D images.
Haptic feedback is achieved using phased arrays of
ultrasound. Since our system can add contact information to
the intangible floating image at any points in the 3D
workspace without the users wearing any devices, the
interaction is free from the tactile bias caused by a worn
device.
The current system has two major limitations. One is the
distance of the two workspaces. Because we use a mirror
system, the workspaces cannot be separated by a long
distance. The second limitation is force. The practical
maximum displayed force is approximately 102mN/cm2 = 10
gf/cm2.
Although the system has these constraints, it can be used as
a test bench for investigating tactile nature. In previous
typical wearable haptic devices, haptic stimulation was
limited to particular points under the worn device. The
proposed system, by contrast, can give haptic feedback at
any points inside the workspace under high controllability
without interfering with visual information. The spatial and
temporal profile of the pressure on the skin can be controlled
by the signal phases of the ultrasound transmitters. The
system can easily create dynamic and interactive 3D clone
images of various objects, including humans. We can
systematically clarify the required specification for the
haptic feedback for enriching the experience, manipulating
3D objects, and evoking emotional effects. This provides a
basic knowledge for designing general interactive systems.
With regard to practical applications, we can find some
situations where direct touches are undesirable. A patient in
a bioclean room in a hospital mentally needs contact with his
or her family, but sometimes this is impossible. In a zoo, we
sometimes want to touch savage animals. In such special
situations, the system is usable even in its current form.
In this paper, we show a prototype that achieves real-time
interaction with a 3D floating image with accompanying
haptic feedback. We also show the basic characteristics of
the system and its performance. By using the prototype, we
examined the effect of a refresh rate and delay of the haptic
feedback on a dynamic floating image. Although a user can
notice a difference of delay between 40 ms and 100 ms, our
experiment showed that the 100-ms delay is allowable to
achieve real-time interaction.
PREVIOUS STUDIES
Remote physical interaction
There have been many studies that achieved “telexistence”
or “telepresence”[3, 5, 13]. Tachi et al. proposed Telesar
systems. In a Telesar V system [6], users can control a remote
robot as if they are in it. They can see the remote area in 3D
through cameras mounted on the robot. In addition, users can
feel a remote object by using pairs of tactile sensors and
displays, which are attached to the robot’s fingers and on the
user’s glove, respectively. This system allows users to pour
water from a bottle into a cup in remote area, for example.
Leithinger et al. proposed the Physical Telepresence system
for remote physical interaction [24]. Arrayed pin actuators
allow users to interact with realistic balls remotely. Users can
interact with remote objects intuitively. Nakanishi et al.
proposed a system that realizes remote handshaking [29].
They used realistic robot hands in a video communication
system. Users can interact remotely through realistic robot
hands.
These previous “telexistence” or “telepresence” systems
basically created a copy of a human in a remote place. People
must wear devices to feel the remote environment, as with
Telesar V, or the environment needs to have actuators to
virtually copy the existence of the people. Our system
mutually clones not only people but also the surrounding 3D
environment. This bidirectionality is a unique feature
compared with those related previous studies.
Ishii proposed the concept of “radical atoms”[14]. The idea
is that objects should change their shape from atom size to
make people use a system intuitively. Physical Telepresence,
as mentioned above, is an implementation of the concept of
radical atoms. Our system seeks a 3D and intuitive interface
by combining 3D optical images and haptic stimulation.
Volumetric display
There have been many research studies that display a 3D
image. The method can be roughly classified into two
categories: wearing a device or not.
When users wear a device, which is usually a pair of
functional glasses or a head-mounted display (HMD), two
different images are displayed to each eye, separately. The
images have differences based on parallax. When the display
screen is near the eyes, then it is called a HMD. When a
display is far from the user, then the device appropriately
filters a single image from doubled images by using color
filters, polarization of light, or electrically controlled shutters.
When we use these systems for 3D communications, it is
difficult to see the natural facial appearance of the users
hidden by the devices. In particular, it is difficult to use HMD
for this purpose.
There are three main types of methods to display floating
images without wearing any special devices. The first one is
to display two different 2D images to each eye with parallax.
Observers feel the depth of the image based on the parallax.
Some of them have used special equipment on the display
surface such as a lenticular lens, parallax barriers [22], and
Fresnel lens [36]. Other methods used many projectors and
special screens to display multi-angle views of the target [28].
In these methods, the appropriate viewing angle has tight
limitation, which can be a problem in practical use.
The second method is to create a light diffuser pattern in
midair. Some of these methods used small floating objects to
project images on them such as fog screens [7,10] and small
particles trapped by ultrasound [32]. Others utilized the
persistence of vision. A rotating LED array or a rotating
mirror produced 3D volumetric images [16, 17, 18, 23].
These displayed images are unsuitable to touch directly.
Rotating objects make fingers collide, and a fog screen can
easily deform with touch. Laser-induced plasma is another
method to form a light pattern in midair [20]. Ochiai et al.
recently realized touchable plasma images with a femto
second laser [33]. These are potentially suitable methods for
touchable 3D images, but their safety should be carefully
examined before practical use.
The third method is to reconstruct a light field. A well-known
example is the hologram, which records and reconstructs the
phase information of light waves as well as their intensity [9,
35]. This is still a challenging technology to realize real-time
and colored 3D images. Another method for reconstructing a
light field is to use a mirror. A half-transparent mirror can
superimpose aerial images onto real objects [11], which is
well known as “Pepper’s ghost.” In this case, the aerial image
is constructed behind the half mirror that prevents the user
from reaching it directly. A concave mirror can also be used
to reproduce a light field in the air. Some research has
proposed a system that enables users to interact with a
floating image with their own hand [4].
Recently, passive micro-mirror array devices have been
proposed [2]. An Aerial Imaging Plate produced by a
company is composed of tiny mirrors and reproduces an
object image in midair (details are given in the
PROTOTYPE SYSTEM section). Maekawa et al. proposed
a micro-mirror device based on a similar principle under a
different fabrication process [26]. There is also a similar light
field reconstruction method based on retro-reflective
material [37]. A couple of MMAPs produce a light field of a
real object as an optical clone image. We named this image
the “light field clone (LFC).”
Midair tactile feedback
Tactile feedback is one of the important research fields in
computer-human interaction. Recently some researchers
proposed methods that produce midair tactile feedback.
Iwamoto et al. proposed the first prototype of an aerial tactile
display by using the radiation pressure of ultrasound: the
airborne ultrasound tactile display (AUTD) [15]. Their group
has expanded AUTD technology to touchable holography,
floating touch panels (HaptoMime) [27], and touchable 3D
tactile images (HORN) [12]. Alexander et al. also proposed
an ultrasound-based tactile feedback method named
Ultrahaptics [1]. They showed that users could identify the
shape of designed touchable objects [25]. Sodhi et al.
proposed an airflow-based tactile display [34]. Recently,
Kim [19] and Ochiai et al. [33] showed that lasers can give
tactile sensations in the air.
In terms of commutations and HCI design opportunities,
Obrist et al. showed that the tactile sensation generated by
Ultrahaptics technology can be verbalized [30], which
implies that noncontact tactile sensation can convey certain
mental images. In addition, their group extended their work
to showing that emotions can be communicable between
groups via midair tactile sensation by controlling the location
on the palm, direction of the movements, frequency,
intensity, and duration of the ultrasonic focal point [31].
In this research, we use AUTD technology for haptic
feedback. The hardware is similar to that in some of the
previous studies; however, this work is the first study that
achieved “real-time 3D interaction with an aerial 3D image.”
There are two other previous works that used AUTD and
MMAP to achieve aerial image with tactile feedback.
“Haptomime” [27] enabled users to interact with a 2D
floating image by using one finger. This was interactive but
two-dimensional. By contrast, “HORN” [12] reconstructed a
3D tactile image but it was not fully interactive. In the demos
of HORN, the displayed haptic object was a static energy-
concentrated distribution. Users could feel a 3D shape by
their own hand motions around the static distribution. The
displayed visual image was 2D. Subramanian et al. proposed
some studies that achieved interactive 2D-sliced volumetric
tactile images by ultrasound [25]. However, they have not
combined them with 3D optical images.
PROTOTYPE SYSTEM
Figure 2 shows our system combining two technologies. One
is a couple of MMAPs (Opposed MMAPs) to create an LFC
image. The other technology is an AUTD to give force
feedback in midair.
Opposed Micro-Mirror Array Plate (MMAP)
We used a micro-mirror array plate (Aerial Imaging Plate,
ASUKANET, Co., Ltd.) which is composed of a tiny mirror
array [2]. A similar principle was proposed by Maekawa [26]
and Yamamoto [37]. Figure 3 (a) shows the details of the
MMAP. There are two layers of mirrors. Based on the
reflections, the image behind the MMAP can be seen in front
of it. One important characteristic of the MMAP is that it
inverts the image depth. As shown in Figure 3 (b), the real
object behind the MMAP is reconstructed in front of it with
its depth inverted.
Figure 4 shows how to avoid this inversion by using two
MMAPs. When we set two MMAPs as shown in the figure,
the object at A is cloned at B by MMAP1with inversion, and
the inverted image at B is cloned at C by MMAP2. Since the
inversion occurs twice, the reconstructed image at C is the
same as the real object at A. Figure 5 shows real objects (left)
and their reconstructed images (right). With this
configuration, adjoining users can see each other’s LFCs
thorough the MMAPs.
Figure 2 Proposed system
(a)
(b)
Figure 3 Basic principle of the MMAP
Figure 4 Two-MMAP system clones noninverted image.
Figure 5 (left) Original objects, (right) reconstructed image
Geometrical Optical Characteristic of MMAP
Here we show that a pair of coordinate systems of a real field
and LFC are identical to the LFC of the other sides and real-
field coordinate. Thus when a real object at A touches the
LFC of an object at C, as shown in Figure 6, then the real
object at C is seen to be touched by the LFC of the object at
A.
Here we assume the distance Dof the two MMAPs, as shown
in Figure 6. As the MMAP copies one side to the other
symmetrically, point X0 on the object at A is cloned as X1
through MMAP1. In this case, the distance L1between X0 and
MMAP1is equal to the distance between X1 and MMAP1.
The cloned image of X1is copied to C through MMAP2. In
the same manner, the distance L2between X1 and MMAP2is
equal to the distance between X2 and MMAP2.
Here we think about the situation when the object at C
touches the cloned finger image at X2. The contact point at
X2 is cloned at X1through MMAP2, and is also cloned to X0
through MMAP1. This is the identical position of the real
finger at A. As a result, a contact event occurs on both sides
A and C in the same manner.
The workspace center can be defined as the point D/2 apart
from the MMAP center, where the LFC of the point is seen
at the same distance from the other MMAP. There is a
tradeoff between the acceptable viewing angle and the
workspace depth represented by D/2. When we increase the
distance Dbetween the two MMAPs, the LFC image appears
closer to the users, which expands a larger workspace.
However, in this case, the LFC image can be seen from a
narrow area because the MMAP has a limitation in its
viewing angle. The distance D should be chosen
appropriately. We empirically chose it as 650 mm.
When we used a pair of MMAPs with D = 650 mm, the
viewable angle can be calculated based on the geometric
optics. Theoretically, they are 43.3º horizontally and 53.1º
vertically when the point light source is located at the
workspace center. In reality, users can see objects at
approximately 40º horizontally and 45º vertically with the
current setting of D.
Figure 6 One side is cloned to the other side by using the same
coordinate system.
Airborne Ultrasound Tactile Display (AUTD)
Converged ultrasound can give force to the object based on
acoustic radiation pressure. In this paper, we use the same
principle as proposed by Iwamoto et al. Figure 7 shows the
AUTD equipment we used in this paper. A total of 249 small
ultrasound transducers whose resonant frequency is 40 kHz
are arrayed so that they can focus by changing their phases.
The maximum radiation pressure by one unit of the AUTD
system is approximately 1.6 gf/cm2, which evokes slight
tactile sensation on a hand. Previous researchers showed two
ways to intensify the perception. One is the amplitude
modulation (AM) of an ultrasound around 100-200 Hz,
which is the most sensitive frequency range for humans. The
AM method is an easy strategy to achieve stronger
perception, although it is accompanied with audible noise.
The other method is use of a standing wave by surrounding
ultrasound transmitters. When the four AUTD modules are
set face-to-face and emit ultrasound, then a standing wave
emerges between them. In this case, we can minimize the
energy-concentrated region and produce a larger skin
deformation. The acoustic flows accompanied by the
radiation pressure are also minimized. This method can
reduce acoustic noise compared with the AM method. We
used the standing-wave method in this paper. Figure 8 shows
our configuration for the standing-wave method with four
AUTDs.
Figure 7 Arrayed ultrasound transducers
Figure 8 Configuration of four AUTD units
Depth measurement
Figure 2 shows our proposed system. Two MMAPs are set
behind the pair of four AUTD units. As we explained,
interactions between real objects and LFCs occur around the
workspace center. A Kinect2 sensor is set for each side to
obtain depth information of objects around the workspace
center.
The system detects objects at each side and integrates them
to calculate their intersections for displaying tactile feedback.
Figure 9 (a) is an example of two depth maps of each side.
The green depth image is obtained from one side, and the red
one is from the other. The system integrated both images to
detect intersections, which are shown in the white square in
Figure 9 (b) and (c). This information is used to decide the
phase patterns of each ultrasound transducer.
Tactile feedback
For tactile feedback, we changed the phases of the
transducers so that the AUTD units made foci at the designed
positions. The algorithm for calculating phase patterns is
similar to the one proposed by Inoue et al. [12]. There are no
limitations on the number of foci in the algorithm, which can
make arbitrary smooth pressure patterns in 3D. Compared
with the method proposed by Long et al. [25], which also
synthesizes ultrasonic holographic images in the air, our
algorithm takes the volumetric arrangement of the phased
array into account. Our algorithm only searches the solution
space that the phased array can generate for optimized phase
distribution of the ultrasonic field. Since the prototype has
phased arrays that surround the work space, this synthesizing
algorithm is more reasonable.
The algorithm requires approximately 35 ms to calculate the
phases of all transducers. The calculated phase information
is transferred to all 996 transducers in 7 ms. This process can
be done in parallel with the position sensing of the Kinect
sensor (30 ms). Since a traveling duration of the ultrasound
from the transducers to the focal point, 20 cm apart, is
approximately 0.5 ms, the highest refresh rate of the current
system is approximately 25 Hz with a delay within 40 ms.
In the following experiments, we evaluated how the
differences in the refresh rates and delays affect the
perception of objects.
CALIBRATION
Before using the system, it must be calibrated to align the
coordinate system of the two adjacent workspaces. We used
a cross-shape bar and put it in one workspace, as shown in
Figure 10. The center of the bar is identical to that of the
workspace, which is set as the origin. This center is aligned
to the coordinate system of the Kinect sensor. With this setup,
we calibrated each coordinate of the workspaces. AUTD
coordinates were also adjusted so that they share an identical
origin.
(a)
(b) One-point contact
(c) Two-point contact
Figure 9 Depth map and integrated image for detecting
contact points of objects
Figure 10 Coordinate system calibration
(a)
(b)
Figure 11 Measured sound pressure distributions: (a) XY-
plane at z= 0 (each side has AUTD units), and (b) YZ-plane at
x= 0
Figure 12 Measured sound pressure distributions with two foci
5 cm apart on YZ-plane at x= 0
SPATIAL DISTRIBUTION OF ULTRASOUND
We measured the spatial distribution of a pressure pattern of
a converged ultrasound. A microphone mounted on a rod
attached to an automatic stage moved every 2 mm and
measured the sound pressure at that point.
Figure 11 (a) shows the measured pressure pattern along the
XY-plane at z= 0, and (b) shows that of the YZ-plane at x=
0, whose coordinate system is given in Figure 10. The origin
is the center of the workspace. In Figure 11 (a), each side has
an AUTD unit. A single focal point was produced at the
origin (0, 0, 0). The size of the effective high-pressure area,
which is shown in red, is approximately 10 mm in diameter.
Figure 12 shows the measured pressure pattern on the YZ-
plane with two foci that converge at (0, 0, 25) and (0, 0, -25).
Two focal points are produced at 50 mm apart, with their
sizes at approximately 10 mm as well. In a previous work,
Inoue et al. showed that an arbitrary smooth pressure pattern
can be produced with this algorithm [12], though they used a
different arrangement of AUTD units.
PRELIMINARY EXPERIMENT ON HUMAN FACTOR
The required refresh rate and acceptable delay are the most
fundamental parameters in a real-time interactive system. In
our system, an optical image is transferred with virtually no
delay because we use a mirror-based system. Thus, we focus
on the required specifications for tactile feedback.
Literature on haptics state that 1 kHz is a sufficient frequency
bandwidth for haptic feedback [8]. However, we are still in
a preliminary stage before using the full bandwidth. A simple
haptic rendering strategy currently available is applying full
power into the cross-sections between the real and clone
objects. The first study was planned to quantify the minimum
refresh rate to keep the haptic quality under this simple
control algorithm during typical interactions.
The acceptable delay is the other crucial parameter in an
interactive system. The physically faithful transmission of
contact between hard objects needs feedback of a small delay
as short as 1 ms [21], which is beyond the capability of
current public communication lines. In our system, however,
the maximum force is small, and only a gentle and soft touch
is produced. The following experiment clarifies the
practically acceptable delay in typical contact situations of
the system.
Preliminary Experiment 1: Effect of refresh rate
For the first experiment, we changed the refresh rate of the
tactile focal point. The authors’ impression is that a high
refresh rate sometimes causes unnecessary vibratory
sensations, which degrades the reality. This is because the
current system permits an impossible situation where the real
object and the clone have deep cross-sections, and the
displayed contact force is a creation with no physical force
measurement. The currently available algorithm applies full
power into the cross-sections, which is the best approach
without assuming an a priori physical model of the elasticity
and dynamics of objects. This preliminary study clarifies the
practical optimum refresh rate under this rendering algorithm.
We prepared three refresh rate conditions: 25, 10, and 4 Hz.
As we described in the TACTILE FEEDBACK section, the
maximum refresh rate technologically available is
approximately 25 Hz in our system. For example, in the 4-
Hz condition, focal points were refreshed every 250 ms.
Even when users moved their hands faster, it was reflected
250 ms later in the worst case. We asked subjects to touch
the virtual cloned object through our system. We prepared
two different objects to be touched: a lightweight paper ball,
which is shown in Figure 1 (c); and an open human hand,
which is placed at the center of the workspace parallel to the
YZ-plane. The former, which was a familiar object for all
subjects, could move when touched. By contrast, the latter
one did not move at all.
When the ball fell out of the workspace, the experimenter put
it back in the workspace, and the subjects could touch it
repeatedly. After 10 s of exploration, the experimenter asked
the subjects to evaluate how real the objects felt. We asked
only about the similarity of the texture of the object, not the
synchronicity with the image. Subjects scored from 1
(completely different) to 7 (the same perception as the real
object). A score of 4 is neutral. Six conditions (three refresh
rates and two objects) were given five times each in random
order to 12 male student subjects.
Results of Preliminary Experiment 1
Figure 13 shows the results of the perceived reality of texture
when the refresh rate is changed. The reality of the tactile
feedback is not sufficiently high for both objects. Based on
the t-test between all pairs in the same contact condition,
there are no significant differences (p > 0.05).
Refresh rate conditions did not affect the perceived reality of
the texture. It is independent of the weight of the objects and
whether the objects move. An interpretation of the result is
discussed later in the DISCUSSION section.
Figure 13 Perceived reality of objects
Preliminary Experiment 2: Effect of perceived delay
For the second experiment, we asked subjects about the
perceived delay of tactile feedback from the 3D optical
image. Since preliminary experiment 1 showed that the
perceived reality of contact had no significant difference
among the refresh rates, we fixed it at the highest one (25
Hz) to keep the texture perception the same. Then we added
an artificial delay of 40, 100, and 250 ms to clarify the effect
on the perception of the synchronicity.
We prepared three different objects to be touched: a
lightweight paper ball, a human static hand, and a moving
hand. The first two objects were the same ones used in the
preliminary experiment 1. The moving hand was added to
this experiment. This is because we thought the dynamics of
the object could affect the sense of synchronicity. The hand
was the easiest object to move freely in the workspace. The
experimenter moved his hand left and right, keeping his hand
vertical. Subjects could touch the moving hand as they liked.
After 10 s of exploration, we asked the subjects to evaluate
how they felt the delay. They chose a number from 1
(obvious delay) to 5 (no delay). Nine conditions (three delays
and three objects) were given five times each in random
order to the same 12 male subjects.
Results of Preliminary Experiment 2
Figure 14 shows the results for the perceived delay. Based
on the t-test, every pair in the same contact object condition
had significant differences (p < 0.05). Subjects could notice
the differences of delay independent of the contact object
conditions.
Figure 14 Perceived delay of objects
EVALUATION
In this section, we conducted a larger user study. We
demonstrated this system in a demo session during an
international conference (SIGGRAPH 2015, Emerging
Technologies). More than 1000 people experienced the
system, and we asked some of them to evaluate the system
through questionnaires. In the demonstration, we chose a 10-
Hz refresh rate condition with a delay of 100 ms because the
authors felt this setting provided the most natural sensation,
1
3
5
7
Paper ball Hand
Perceived Reality of Texture
25 10 4
Refresh rate (Hz)
Real
Not Real
1
2
3
4
5
Paper Ball Static Hand Moving hand
Perceived Delay
40 100 250
Delay (ms)
No delay
Obvious
delay
especially in hand-to-hand contact, which was mainly
experienced in the demonstration. The 25-Hz condition
evoked stronger vibration that feels unnatural.
Method
In the demonstration, we asked 59 people (46 male and 13
female) to answer the following questionnaires. It was the
first time that these participants experienced the system.
Almost all of them felt the instructor’s static and moving
hand and a paper ball through the system.
We asked the following questions:
(Q1) Was the floating image seen as 3D?
(Q2) Was the haptic feedback felt as 3D?
(Q3) Were the visual and haptic reactions felt in real-time?
(Q4) Did the haptic feedback enrich the reality of the
floating image?
(Q5) Was the haptic feedback realistic?
Participants could score from 1 to 7 as an integer. A score of
1 means “definitely no,” 4 means “neutral,” and 7 means
“definitely yes.”
The purpose of questions (Q1) and (Q2) is to determine the
quality of our proposed system. Ideally, both the visual
image and the aerial tactile feedback should be felt as 3D.
Question (Q3) evaluates the delay of tactile feedback. Since
the floating image is produced based on a mirror system,
there is no delay in the image. By contrast, the tactile
feedback has a 100-ms refreshing delay. We evaluated how
the delay affects the experiences. Questions (Q4) and (Q5)
evaluate the effects of the tactile feedback. Since the tactile
sensation produced by an AUTD is too weak to represent a
hard object, the sensation itself is not real, which is shown in
our preliminary experiment. Thus, the score for (Q5) will not
be high. On the other hand, even when the stimulus is not
real, the tactile sensation synchronized with the floating
image can enrich the quality of the images. We asked about
the importance of the tactile feedback through (Q4) and (Q5).
Figure 15 Results of five questionnaires. Q1–Q4 are
significantly higher than NEUTRAL (score: 4), while Q5 has
no significant differences from NEUTRAL
Results
Figure 15 shows the results of the questions. The vertical axis
shows the score: 7 represents “definitely yes,” 1 represents
“definitely no,” and 4 means “neutral.” By using this t-test,
we compared these scores to the uniform distribution model,
whose average was 4 and variance was 2, which represented
a perfectly random answer. Almost all of the questions are
significantly different (p < 0.01) from neutral except for
condition (Q5).
The results of (Q1) and (Q2) show that both the images and
tactile patterns were perceived as 3D. The result of (Q3)
shows that the tactile feedback was felt in real-time even
though the system had a 100-ms delay. Interesting results are
seen in questions (Q4) and (Q5). From the result of (Q4), the
tactile feedback by an AUTD enriched the reality of the
visual floating images. However, from the result of (Q5), the
tactile feedback itself is not sufficiently realistic, which
corresponds to our preliminary experiment shown in Figure
13.
These results show the limitation of a simple word: “realistic.”
In some situations, for example, where one hand gently
stroked the other hand, we felt a realistic feeling, in a sense.
However, sometimes the real hand penetrated the clone, and
the shear force and small vibrations were not reproduced,
which caused an “unrealistic” feeling. These results reveal
that haptic cues of a contact event are effective in
comprehending the interactions even if the displayed
perception was not faithful to the actual contacts.
DISCUSSIONS
Through the two preliminary experiments, we could find that
1) refresh rate conditions did not affect the perceived reality
of textures of objects, though the statistical powers were not
sufficiently high; and 2) people noticed the differences
between delays of 40, 100, and 250 ms with the floating
image. Detailed statistical results of experiments are given in
the supplementary material.
Although the results in preliminary experiment 1 showed no
significant differences, there are different tendencies
between the two different contact object conditions. When
users touched a paper ball, a higher refresh rate was felt more
realistically. This is because the contact with the moving ball
is instantaneous. A higher refresh rate could reproduce a
higher frequency component of the impact on the paper ball.
By contrast, for touching the soft skin tissue of the hand, the
10-Hz condition scored the highest among the three
conditions. Some subjects commented that “the 25-Hz
(highest refresh rate) condition produced a stronger vibratory
sensation than the other two conditions, which was different
from the smooth skin surface.”
Based on the results of preliminary experiment 2, users could
feel the differences among the delays. It is easier to notice
the delay when touching a moving object than in a static
condition. At the same time, even when the differences are
noticeable, the reality was acceptable for users in some cases.
1
2
3
4
5
6
7
Q1 Q2 Q3 Q4 Q5
Definitely
YES
Definitely
NO
Neutral
The result for Q3 shows that a 10-Hz refresh rate (i.e., 100-
ms delay) was felt in real-time by users. Although we could
notice the delay differences between 40 ms and 100 ms, the
delay of 100 ms seemed allowable in the current setting. This
result can be used to determine the parameters for a
telecommunication system.
LIMITATIONS
The current system has four main limitations. One is
feedback force control. Our prototype system detects
geometric overlap regions of two objects (one is real and the
other is cloned) based on the depth images. Currently the
system gives force feedback with the maximum output
amplitude of ultrasound with no estimation process of the
contact force amplitude. Therefore, it is still difficult to
manipulate objects precisely in this system. The speed of
contact and prior information on the objects can be used to
estimate a collision force to simulate the situation
realistically. This will be one of our future works.
The second limitation is the collision detection of occluded
sides of the hand and object. Because we used the Kinect
sensor, the blind side of the object surface cannot be
estimated. Thus when a user attempts to interact with a
cloned paper ball, he or she can only touch the front side. It
is impossible to pull back the cloned paper ball by touching
the blind side toward the user. The simple reason for this
problem is there are no spaces in which to put the Kinect
sensor. We can add another depth sensor to measure the other
side to solve this issue.
The third limitation comes from the nature of AUTD, which
is basically inevitable. This is the weakness of the displayed
maximum force. The force direction is limited to a normal
force to the surface. In an analogy of visual and auditory
systems, the spatial and temporal patterns transmit a variety
of useful information even if the amplitude is limited. Our
future work will clarify the possibility of the haptic
stimulation with excellent controllability under small
maximum forces.
Since the system uses mirror-based optics, the current setting
cannot be distributed. We have to discuss how this
configuration can be extended to a real distributed
telepresence system. This is the fourth limitation. Below we
explain this from both the haptic and optical perspectives.
From the haptic perspective, the current setting can be
distributed as is, at least physically. The haptic system of two
workspaces is already connected by the TCP/IP protocol.
The delay in the communication path can be a problem
during practical use, which is a general problem with haptic
communication. In the system, however, the requirement for
a delay is not strict since the transmitted information is
limited to the contact of soft and gentle touch, as the
experimental results show.
From the optical perspective, full replacement of the device
is required. For that purpose, a high-quality 3D record and
replay system is needed. Currently many groups are
developing a high-quality light-field cameras and displays.
These technologies can be substitutions for our mirror-based
system.
CONCLUSIONS
In this paper, we proposed a new mutual tele-environment
system named “HaptoClone.” The system transferred light
field images to an adjacent workspace based on a MMAP.
The MMAP is composed of a tiny mirror array that could
reflects the image symmetrically to the other side. With this
light field technology, users could see the cloned adjacent
environment in real-time. They could directly reach the
cloned image without wearing any glasses. In order to enrich
the experience, we also implemented a tactile feedback
system based on an Airborne Ultrasound Tactile Display
(AUTD). Converged ultrasound can give force feedback
without requiring the installation of devices on the users.
Moreover, the objects in workspaces can physically interact
with each other. Our system is the first system to examine
the importance of haptic feedback on real -time interaction
with 3D images.
We developed a prototype system and conducted some user
studies on the effects of tactile feedback. The results
indicated that a delay of 100 ms can be allowed in the current
system. Although the tactile feedback by the AUTD system
was not an accurate representation of real contact, it enriched
the reality of interaction with floating images.
ACKNOWLEDGMENTS
This work was supported in part by the JSPS Grant-in-Aid
for Scientific Research (A) 25240032 and the JSPS Grant-
in-Aid for Young Scientists (A) 15H05315.
REFERENCES
1. Jason Alexander, Mark T. Marshall, and Sriram
Subramanian. 2011. Adding haptic feedback to mobile
TV. In CHI ’11 Extended Abstracts on Human Factors
in Computing Systems (CHI EA ’11).
2. Asukanet Co. Ltd: AI Plate http://aerialimaging.tv/.
3. Hrvoje Benko, Ricardo Jota, and Andrew Wilson.
2012. MirageTable: freehand interaction on a projected
augmented reality tabletop. CHI ’12, 199–208.
4. Alex Butler, Otmar Hilliges, Shahram Izadi, Steve
Hodges, David Molyneaux, David Kim, and Danny
Kong. 2011. Vermeer: direct interaction with a 360°
viewable 3D display. UIST ’11, 569–576.
5. Maayan Cohen, Kody R. Dillman, Haley MacLeod,
Seth Hunter, and Anthony Tang. 2014. OneSpace:
shared visual scenes for active freeplay. CHI ’14,
2177–2180.
6. Charith L. Fernando, Masahiro Furukawa, Tadatoshi
Kurogi, Sho Kamuro, Katsunari Sato, Kouta
Minamizawa, and Susumu Tachi. 2012. Design of
TELESAR V for transferring bodily consciousness in
Telexistence. IROS2012, 5112–5118.
7. Fogscreen: http://www.fogscreen.com/.
8. George A. Gescheider, Stanley J. Bolanowski, and
Kathleen R. Hardick. 2001. The frequency selectivity
of information-processing channels in the tactile
sensory system. Somatosensory & Motor Research,
18(3), 191–201.
9. Nobuyuki Hashimoto, Shigeru Morokawa, and Kohei
Kitamura. 1991. Real-time holography using the high-
resolution LCTV-SLM, Proc. SPIE 1461, Practical
Holography V, 291.
10. Heliodisplay: http://www.io2technology.com/.
11. Otmar Hilliges, David Kim, Shahram Izadi, Malte
Weiss, and Andrew Wilson. 2012. HoloDesk: direct
3D interactions with a situated see-through display.
CHI ’12, 2421–2430.
12. Seki Inoue, Yasutoshi Makino, and Hiroyuki Shinoda.
2015. Active touch perception produced by airborne
ultrasonic haptic hologram. IEEE World Haptics
Conference, 362–367.
13. Hiroshi Ishii and Minoru Kobayashi. 1992.
ClearBoard: a seamless medium for shared drawing
and conversation with eye contact. CHI ’92, 525–532.
14. Hiroshi Ishii, Dávid Lakatos, Leonardo Bonanni, and
Jean-Baptiste Labrune. 2012. Radical atoms: beyond
tangible bits, toward transformable materials.
Interactions 19, 1 (January 2012), 38–51.
15. Takayuki Iwamoto, Mari Tatezono, and Hiroyuki
Shinoda. 2008. Non-contact method for producing
tactile sensation using airborne ultrasound. Eurohaptics
2008, 504–513.
16. David G. Jansson, and Richard P. Kosowsky. 1984.
Display of moving volumetric images. Proc. SPIE
0507, Processing and Display of Three-Dimensional
Data II, 82.
17. Andrew Jones, Ian McDowall, Hideshi Yamada, Mark
Bolas, and Paul Debevec. 2007. Rendering for an
interactive 360° light field display. SIGGRAPH ’07,
Article 40.
18. Ken-ichi Kameyama, Koichi Ohtomi, and Yukio
Fukui. 1993. Interactive 3-D volume scanning display
with optical relay system and multi-dimensional input
device. Proc. SPIE 1915, Stereoscopic Displays and
Applications IV, 12
19. Hyung-Sik Kim, Ji-Sun Kim, Gu-In Jung, Jae-Hoon
Jun, Jong-Rak Park, Sung-Phil Kim, Seungmoon Choi,
Sung-Jun Park, Mi-Hyun Choi, and Soon-Cheol
Chung. 2015. Evaluation of the possibility and
response characteristics of laser-induced tactile
sensation, Neuroscience Letters. Vol. 602, pp. 68–72.
20. Hideki Kimura, Taro Uchiyama, and Hiroyuki
Yoshikawa. 2006. Laser produced 3D display in the
air. In ACM SIGGRAPH 2006 Emerging
Technologies.
21. Katherine J. Kuchenbecker, Jonathan Fiene, and
Gunter Niemeyer. 2006. Improving contact realism
through event-based haptic feedback. IEEE
Transactions on Visualization and Computer Graphics,
12 (2), 219–230.
22. Yutaka Kunita, Naoko Ogawa, Atsushi Sakuma,
Masahiko Inami, Taro Maeda, and Susumu Tachi.
2001. Immersive Autostereoscopic Display for Mutual
Telexistence: TWISTER I (Telexistence Wide-angle
Immersive STEReoscope model I). IEEE VR 2001,
31–36.
23. Knut Langhans, Detlef Bahr, Daniel Bezecny, Dennis
Homann, Klaas Oltmann, Krischan Oltmann, Christian
Guill, Elisabeth Rieper, and Goets Ardey. 2002. FELIX
3D display: an interactive tool for volumetric imaging.
Proc. SPIE 4660, Stereoscopic Displays and Virtual
Reality Systems IX, 176.
24. Daniel Leithinger, Sean Follmer, Alex Olwal, and
Hiroshi Ishii. 2014. Physical telepresence: shape
capture and display for embodied, computer-mediated
remote collaboration. UIST '14 461-470.
25. Benjamin Long, Sue Ann Seah, Tom Carter, and
Sriram Subramanian. 2014. Rendering volumetric
haptic shapes in mid-air using ultrasound. ACM Trans.
Graph. 33, 6, Article 181 (November 2014), 10 pages.
26. Satoshi Maekawa, Kouichi Nitta, and Osamu Matoba.
2006. Transmissive optical imaging device with
micromirror array. Proc. SPIE 6392, 63920E.
27. Yasuaki Monnai, Keisuke Hasegawa, Masahiro
Fujiwara, Kazuma Yoshino, Seki Inoue, and Hiroyuki
Shinoda. 2014. HaptoMime: mid-air haptic interaction
with a floating virtual screen. UIST ’14, 663–667.
28. Koki Nagano, Andrew Jones, Jing Liu, Jay Busch,
Xueming Yu, Mark Bolas, and Paul Debevec. 2013.
An autostereoscopic projector array optimized for 3D
facial display. In ACM SIGGRAPH 2013 Emerging
Technologies.
29. Hideyuki Nakanishi, Kazuaki Tanaka, and Yuya Wada.
2014. Remote handshaking: touch enhances video-
mediated social telepresence. CHI ’14, 2143–2152.
30. Marianna Obrist, Sue Ann Seah, and Sriram
Subramanian. 2013. Talking about tactile experiences.
CHI ’13, 1659–1668.
31. Marianna Obrist, Sriram Subramanian, Elia Gatti,
Benjamin Long, and Thomas Carter. 2015. Emotions
mediated through mid-air haptics. CHI ’15, 2053–
2062.
32. Yoichi Ochiai, Takayuki Hoshi, and Jun Rekimoto.
2014. Pixie dust: graphics generated by levitated and
animated objects in computational acoustic-potential
field. ACM Trans. Graph. 33, 4, Article 85 (July 2014),
13 pages.
33. Yoichi Ochiai, Kota Kumagai, Takayuki Hoshi, Jun
Rekimoto, Satoshi Hasegawa, and Yoshio Hayasaki.
2015. Fairy lights in femtoseconds: aerial and
volumetric graphics rendered by focused femtosecond
laser combined with computational holographic fields.
In ACM SIGGRAPH 2015 Emerging Technologies.
34. Rajinder Sodhi, Ivan Poupyrev, Matthew Glisson, and
Ali Israr. 2013. AIREAL: interactive tactile
experiences in free air. ACM Trans. Graph. 32, 4,
Article 134 (July 2013).
35. Pierre St-Hilaire, Stephen A. Benton, Mark Lucente,
Mary L. Jepsen, Joel Kollin, Hiroshi Yoshikawa, and
John Underkoffler. 1990. Electronic display system for
computational holography. Proc. SPIE 1212, Practical
Holography IV, 174.
36. Yuta Ueda, Karin Iwazaki, Mina Shibasaki, Yusuke
Mizushina, Masahiro Furukawa, Hideaki Nii, Kouta
Minamizawa, and Susumu Tachi. 2014.
HaptoMIRAGE: mid-air autostereoscopic display for
seamless interaction with mixed reality environments.
In ACM SIGGRAPH 2014 Emerging Technologies.
37. Hirotsugu Yamamoto, and Shiro Suyama. 2013. Aerial
Imaging by Retro-Reflection (AIRR). SID Symposium
Digest of Technical Papers. Vol. 44, No. 1, 895–897.