Conference PaperPDF Available

Abstract

Sensory substitution is a technique whereby sensory information in one modality such as vision can be assimilated by an individual in another modality such as hearing. This paper makes use of a depth sensor to provide a spatial-auditory sensory substitution system, which converts an array of range data to spatial auditory information in real-time. In experiments, participants were trained with the system then blindfolded, seated behind a table while equipped with the sensory substitution system while keeping the sensor in front of their eyes. In the experiments, participants had to localise a target on the table by reporting its direction and in its distance. Results showed that the using the proposed system participants achieved a high accuracy rate (90%) in detecting the direction of the object, and showed a performance of 56% for determining the object's distance.
978-1-5090-1679-2/16/$31.00 c
2016 IEEE
Görmeyi Uzaysal ˙
sitme Duyusu ile ˙
Ikame Eden Bir
Sistemin De˘
gerlendirilmesi
Assessment of a Visual to Spatial-Audio Sensory
Substitution System
Ahmad Mhaish, Torkan Gholamalizadeh, Gökhan ˙
Ince, Damien Jade Duff
Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
mhaish@itu.edu.tr, djduff@itu.edu.tr
Özetçe —Duyuların ikamesi, görme gibi tek bir kipteki
duyusal bilginin i¸sitme gibi bir ba¸ska kipteki bilgi ile bir birey
tarafından özümsenmesi tekni˘
gidir. Bu bildiri derinlik sensörü
kullanılarak uzaklık verisinden olu¸san bir diziyi konumsal ses
bilgisine gerçek zamanlı olarak çevirerek, bir konumsal-ses duyu
ikamesi sistemi yaratmaktadır. Deneylerde, katılımcılar duyu
ikamesi sistemine ait sensörü gözlerinin önünde tutarak bir
masaya oturmu¸s pozisyonda önce sistem ile e˘
gitilmi¸sler, daha
sonra da gözleri ba˘
glı olarak test edilmi¸slerdir. Deneylerde
katılımcılardan hedef nesnenin masanın neresine konuldu˘
gunu iki
adımda bulmaları istenmi¸stir: birinci olarak yönü, ikinci olarak
da mesafeyi kestirerek. Katılımcılar önerilen sistemi kullanarak
nesnenin yönünü bulmada yüksek bir do˘
gruluk oranı (%90) ve
mesafesini bulmada da %56’lık bir ba¸sarım göstermi¸stir.
Anahtar KelimelerYardımcı uygulamalar, binoral i¸saretler,
nokta bulutları, insan bilgisayar etkile¸simi, ses sentezi.
Abstract—Sensory substitution is a technique whereby sensory
information in one modality such as vision can be assimilated
by an individual in another modality such as hearing. This
paper makes use of a depth sensor to provide a spatial-auditory
sensory substitution system, which converts an array of range
data to spatial auditory information in real-time. In experiments,
participants were trained with the system then blindfolded, seated
behind a table while equipped with the sensory substitution
system while keeping the sensor in front of their eyes. In the
experiments, participants had to localise a target on the table by
reporting its direction and in its distance. Results showed that the
using the proposed system participants achieved a high accuracy
rate (90%) in detecting the direction of the object, and showed
a performance of 56% for determining the object’s distance.
KeywordsAssistive applications, binaural cues, point clouds,
human-computer interaction, sound synthesis.
I. INTRODUCTION
For many people, music can evoke a sense of space, of
landscapes or objects. For some synaesthetic people, who
experience one modality (such as sound) as if it were another
(such as vision), this is even more true [1]. “Soundscapes”
are musicians’ attempt to invoke such a sense through engi-
neering of layers of sound. The auditory sense is inherently
spatial, which is why stereo and surround sound are important
components of modern audio engineering. In our project we go
a step further than soundscapes by attempting to use sound to
invoke the real space, landscapes and objects that are captured
by a depth camera attached to a user.
Such systems, that use one sense to present information
to users extracted from another modality (sense), are called
“sensory substitution” systems , and have been used to transmit
information across many different modalities. In particular,
sensory substitution systems that present visual or spatial
information to users promise to provide perceptual support
for blind people, giving them a sense of their immediate
environment, objects that they might manipulate, or enabling
them to navigate the space between obstacles. To-date this
ambition has not been properly realised so our work can be
considered a continuation of the exploration of the space of
possible systems, inspired by the following principles, which
also constitute the novelty of our proposal:
Rather than transmitting visual information to a user,
spatial information such as surfaces is transmitted.
We hypothesise that it is more easily dealt with by
the auditory modality and can be encoded using less
sensory bandwidth.
The human perceptual system is advanced and adap-
tive. Over-processing the information presented to the
user, as is done in some systems that tell the user
to head “right” or “straight”, is avoided. Instead, we
prefer to present the user with information that they
can then make use of – trusting the adaptability of the
user to different kinds of information. At the same
time, the information needs to be presented in an
intuitive form (analogous to the existing perceptual
experience of users) so that users can quickly and
more deeply master the system.
Information will be delivered to the user in real-time
so that they can explore their physical environment
intuitively. We plan in later iterations of our system to
give the user transparent control over what information
is delivered to them, so that they can overcome the
limits of the auditory modality, whose bandwidth is
relatively limited.
An additional use of sensory substitution systems is ex-
ploration of human cross-modal perception. When consid-
ering cross-modal perception, such questions arise as: How
independent of modality are human perceptual processes, or
how potentially independent of modality could they be given
plasticity or training? Are there characteristics of different
modalities that can be illuminated by sensory substitution
systems that use them? Could it be that the auditory pathway
is more adaptable to spatial information about spatially located
surfaces, since information about spatially located surfaces
is more directly analogous to information that the auditory
modality processes normally - for example, the sound that is
generated by spatially located excitement of surfaces. As the
proposed system is designed to generate sound from surfaces
and in a way roughly analogous to the physical synthesis of
sound, our work is an early step in the exploration of this
hypothesis.
In the present paper we restrict ourselves to tabletop
scenarios with only one object at a time. This allows us to
explore the capabilities of our proof-of-concept system with
respect to the tasks of object localisation.
II. LITERATURE REVIEW
Sensory substitution as a subject of earnest investigation
can be traced back to at least 1957 with the beginning of
work on the use of active touch effectors as a means of
communication, and the late 1960s with attempts to provide
dense visual information to users (including sight-impaired
users) in the form of vibrating tactile arrays [2].
With early impressive results, many users of touch and
audio sensory substitution systems report experiencing spa-
tial rather than simply sensory experiences, and are able
to somewhat master the spatial world, recognizing objects,
locating and tracking objects, detecting obstacles, and even
catching falling objects [3,4]. Despite these early advances,
these systems have not been used much. They can be used in
everyday life only in specific situations in which the user is
required to put a lot of effort into understanding the object or
scene being examined, and often are considered novelties or
curiosities. The main reasons for this are twofold:
There is an acuity trade-off between temporal, spatial
and amplitude discrimination meaning that the user
often cannot access the full richness of perceptual
information achievable by vision.
The signal being received by the user is made up of a
lot of irrelevant visual information, such that, in real-
world situations, the signal of interest is crowded out
by irrelevant visual information.
Although the first of these drawbacks might be considered a
fundamental limit on the bandwidth of information transfer, we
believe that control by the user over the sensory substitution
system can be used to somewhat mitigate this limit. In our
system we work on making a system that is as transparent as
possible to the user. Ultimately we plan to make our system
allow users to control which part of a visual scene is processed
and how. The second drawback is addressed in our system
by focusing on surface information rather than visual, provid-
ing spatial information directly to a largely spatial modality,
namely audition.
One the most well-known sensory substitution systems is
the vOICe [4], which transmits a full grey-scale image by
scanning an image from left to right. The amplitudes of pixels
high in the image are converted to frequencies in the high part
of the audio spectrum and the amplitudes of pixels low in the
image are converted to frequencies in the low part of the audio
spectrum. It does this without 3D spatial audio. The system
has been shown to enable the recognition and localisation of
objects in high contrast environments, however the scanning
process causes a delay (approximately 1 second) which makes
it easy for users to get lost and indeed reduces the sense of
spatial embeddedness.
Among tactile sensory substitution systems Bach-y-Rita’s
tongue system presents visual information, typically in the
form of grey-scale images, as tactile or electrical stimulation
to the tongue [3]. This kind of responsive sensory substitution
systems that produce tactile stimulation allow users to respond
to external stimuli, such as ball-catching, in real-time, but only
under appropriate controlled high-contrast conditions.
The most similar previous work to the proposed project
is that of Dunai et al. [5], that use stereo image processing to
extract the horizontal location and height of a moving object in
a scene and then transform that location and height into a point
in the visual field (distance from the user being represented as
amplitude), with object speed represented by pitch. The aim
in the project proposed in the current application is similar,
except that it is based around attempting to communicate a
dense field of range information to the user in real-time, rather
than a single object. In its current iteration, our system is
only tested on single objects but still maintains the distinction
of providing some object discrimination capabilities and finer
location information. The use of time-of-flight technology and
point cloud processing is a novelty that will also ultimately
allow an exploration of characteristics of arbitrary surface
sections, such as orientation and curvature rather than full
objects, giving the user information more focused on the
potential navigation and manipulation of objects.
III. SYSTEM DESCRIPTION
A. Hardware
At present we make use of the following hardware (see
Figure 1 for an image of the system during training). We
developed the system and tested it using SoftKinetic Depth-
Sense 325 [6] in our experiments as it has a smaller minimum
range than competing depth sensors that can be bought off the
shelf. As a portable processing station we used a laptop with
Core i5 processor, 8 GB RAM and SSD hard drive, running
Kubuntu 14.04, the Point Cloud Library [7], relevant sensory
drivers, and the OpenAL [8] library as the base framework for
generating sounds. We also utilised headphones to play created
sounds to the participants.
B. Software
Figure 2 shows an overview of the proposed software
architecture. Data is acquired in the form of point clouds and
segmented according to surface normal direction (obtaining
contiguous and similar surfaces). Salient surface segments (in
practice only one) are extracted and tracked, and the surface
is presented as an audio signal located spatially as per the
centroid of the points in the surface. We manage to do this at
30Hz - as fast as data comes from the sensor. More discussion
of each component is below.
Figure 1. Participant interacting with the system.
Depth Sense Point Cloud
Selec!on &
Tracking Segmenta!on
Feature Extrac!on
Segmented
Clouds
Sound playing
Preprocessing
Sound Synthesis &
Interpola!on
Figure 2. Software level process flow diagram.
1) Preprocessing: The most expensive part of the pro-
cessing pipeline is surface normal extraction, needed for
curvature-based segmentation. For real-time performance, the
2D “organised” property of point clouds from range cameras
is exploited using the integral image algorithm supplied by
Holzer et al. [9]. We used a maximum depth change factor
of 0.5 to ensure sensible surfaces from which to calculate
normals and and normal smoothing size of 70 points. For our
tabletop experiments we also throw away horizontally oriented
surface points so that we can ignore the tabletop on which our
target object is supported, improving subsequent segmentation
results.
2) Segmentation: Segmentation proceeds according to the
method of Trevor et al. [10], exploiting the fact that the depth
data is stored in a dense 2D array. Each point’s neighbours are
accessible in constant time. In two passes a set of segments
is found such that neighbouring points within each segment
possess the property that their depth ratio is under some
threshold (0.08 in our case) and the cosine of the angle between
their surface normals is above a separate threshold (0.997).
By adding the latter criterion, surfaces from a tabletop can be
effectively segmented without needing to explicitly calculate
and extract the tabletop. This algorithm is linear time in the
number of points, which allows our system to work in real-
time, and yet makes it configurable enough (as long as we can
express our requirements for segmentation as pairwise con-
straints) for future extension. Horizontally oriented segments
are thrown away by getting the average of normals in the y
direction and comparing it with a threshold (0.6 in our case).
Also, to accommodate noise, any small segments are removed
using a threshold for segment size (50 points in our case). An
example segmentation can be seen in Figure 3.
Figure 3. Screenshots form the recorded video (left panel) and segmentation
process (right panel). The segmentation output also shows a selection of
surface normals.
3) Selection & Tracking: Next the centroid of each segment
obtained from the preceding step and the distance to it are
calculated, segments are associated over time and segments
chosen to be further processed - at this stage we would like to
give the user more control over the segments to be processed,
enable multiple segments, and use head-tracking capabilities
to add stability; but the existing system uses just the single
nearest large segment.
4) Feature extraction: After selecting the target segment
we have the point cloud of the segment from which we can
extract many spatial properties, such as size and shape, but at
present we make use only the distance to the segment. This
value is smoothed with a low-pass filter for stability.
5) Sound Synthesis and Interpolation: The distance to the
target segment is turned into a vibration in a space proportional
to the distance to the object, and appropriate frequencies
introduced at the centroid of the segment using spatial audio.
The system keeps an updated estimate of the amount of time
between frames and attempts to construct a sound on the
basis of the currently extracted segment features. The sound
is constructed so that it will likely last until approximately the
next frame. A circular buffer of samples is filled but envelopes
are applied so that from frame to frame sounds transition
smoothly. The sound synthesis component acts as a producer
for the sound player consumer.
6) Sound player: The sound playing component reads as
many samples as necessary from the circular buffer and acts as
a consumer of information placed in the circular sound buffer,
using OpenAL to play spatially located sound.
The OpenAL audio system [8] is an audio playback system
capable of taking sample arrays and playing them spatially
for the user using either basic binaural cues or Head-Related
Transfer Functions (HRTFs). It is this out-of-the-box func-
tionality that attracts us to OpenAL but its disadvantage is
that fine temporal control is poor – we cannot for example,
learn the progress of OpenAL in playing a sample to a finer
resolution than about 40Hz. It does not keep an accurate record
of which buffers that we have provided to it have or have
not played at a finer resolution. Since our system functions at
30Hz this is insufficient. Instead we must estimate the inter-
frame duration and maintain a small lag in order to prevent
clicks when the interframe duration is unexpectedly long. We
also try to incorporate what information OpenAL provides us
about its progress in the resulting control system.
IV. TRAINING AND EXP ER IM EN T
The performance of the current proposed system was
measured with a tabletop localisation experiment. Before con-
ducting experiments, a training session took place. During the
training session participants were seated behind a labelled table
(see Figure 4 for the layout) and were given a short verbal
explanation about how the system works and the meaning of
the table labelling. The experimenter explained the relation
between the location of the object and the frequency and
apparent spatial location of sounds produced by the system.
During this session participants were told about the rules of the
experiment and they were given the opportunity to familiarise
themselves with the system for 5 minutes while their eyes were
open. Both in training and trial sessions participants held the
camera in front of their eyes so that the camera touched their
forehead; they were not allowed to move the camera freely, but
they could observe the object by moving their head in different
directions.
3
3
3
3
2
1
1
1
1
FRONT LEFT
LEFT
FRONT RIGHT
90 DEG
45 DEG135 DEG
RIGHT
2 2
2
Figure 4. The labelling of the target sections.
In the trial, a box (size 15×10×7 cm) was put on the table
at three different distances from the user: 80 cm (range 3), 60
cm (range 2), and 40 cm (range 1), and in 4 different sections
045(right), 4590(front right), 90135(front left)
and 135180(left), so that we could measure both range and
angular error. We put the object in all possible twelve regions
in a random order and participants were allowed to interact
with the object using our sensory substitution system for a
maximum of 20 seconds, and then were asked to determine
the current object direction and its distance. Answers of
participants were recorded in a confusion matrix.
V. RESULTS
The localisation experiment was done on 9 non-disabled
participants with an average age of 25 years. The confusion
matrices for angle and range can be found in Table I and Table
II respectively.
Participants were able to choose the correct direction
(angle) in 89.8% of runs as shown in Table I. The factor
causing most of the mistakes (seeing the object at left front
or right front when it was ahead) occurred because the system
would sometimes detect the edge of the table as an object.
Range errors (Table II) were much more common. Partici-
pants only chose the right distance 56.5% of the time. The most
confusion between related distances occurred as the difference
in frequency was hard to detect, and because users had trouble
Table I. DIRECTION CONFUSION MATRIX
Real vs. Estimated
Direction Right Front Right Front Left Left
Right 27 0 0 0
Front Right 125 1 0
Front Left 2 0 25 0
Left 2 2 3 20
Table II. RANGE CONFUSION MATRIX.
Real vs. Estimated
Range Range 1 Range 2 Range 3
Range 1 20 14 12
Range 2 12 17 7
Range 3 4824
orienting themselves with respect to the table, particularly
when the object was at the front of the table since then the
object often became closer than the minimum effective range
of the sensor (15cm).
VI. CONCLUSION
We presented a spatial-audio sensory substitution system,
which is unique in that it segments objects or surfaces from
point-clouds and presents them in real-time using spatial
audio. The architecture of the system allows for real-time
performance and tracking of perceptual objects by users. The
provided system showed sufficient performance for locating an
object but was only tested in the tabletop scenario.
We plan to extend the system to other scenarios, deal with
multiple surface primitives, encode more information about
surfaces in a scene, and increase stability and user-system
interaction by giving the user more control over selection,
including the ability to focus on specific parts of a scene.
REF ER EN CE LI ST
[1] S. Day, “Synaesthesia and Synaesthetic Metaphors,” Psyche, vol. 2,
no. 32, 1996.
[2] B. W. White, F. A. Saunders, L. Scadden, P. Bach-Y-Rita, and C. C.
Collins, “Seeing with the skin,” Perception & Psychophysics, vol. 7,
no. 1, pp. 23–27, Jan. 1970.
[3] P. Bach-y Rita and S. W. Kercel, “Sensory substitution and the hu-
man–machine interface,” Trends in Cognitive Sciences, vol. 7, no. 12,
pp. 541–546, 2003.
[4] M. Auvray, S. Hanneton, and J. K. O’Regan, “Learning to perceive
with a visuo-auditory substitution system: Localisation and object
recognition with ’The vOICe’,” Perception, vol. 36, no. 3, pp. 416–
430, 2007.
[5] L. Dunai, G. Fajarnes, V. Praderas, B. Garcia, and I. Lengua, “Real-
time assistance prototype - A new navigation aid for blind people,” in
IECON, Glendale, Arizona, USA, Nov. 2010, pp. 1173–1178.
[6] “DepthSense Cameras.” [Online]. Available: http://www.softkinetic.
com/en-us/products/depthsensecameras.aspx
[7] R. B. Rusu and S. Cousins, “3D is here: Point cloud library (PCL),” in
ICRA, Shanghai, China, 2011, pp. 1–4.
[8] “OpenAL library.” [Online]. Available: https://www.openal.org/
[9] S. Holzer, R. B. Rusu, M. Dixon, S. Gedikli, and N. Navab, “Adaptive
neighborhood selection for real-time surface normal estimation from
organized point cloud data using integral images,” in IROS, Vilamoura,
Algarve, 2012.
[10] A. J. Trevor, S. Gedikli, R. B. Rusu, and H. I. Christensen, “Efficient
organized point cloud segmentation with connected components,” in
ICRA Workshop on Semantic Perception (SPME), Karlsruhe, Germany,
2013, pp. 1363–1370.
This work was supported by TÜBİTAK (The Scientific Research Council of Turkey) Project No. 114E443.
... As a first step, we enabled users to localize single objects on a table top using simple spatial audio and tones to sonify direction and distance of objects [4]; next, we investigated different approaches to sonification of simulated 3D shapes [5]. An open research question is whether such sonification methods can be used in the real world; as such, in the current paper, instead of only sonifying simulated shapes, we test a new version of our system that enables localization and recognition of objects on the floor in an empty room. ...
... In order to enable more sophisticated, longer lasting sounds to allow better discrimination of object parts, we build on previously developed object/part segmentation and background noise reduction techniques [4], implementing a scheme for consistently tracking object parts from frame to frame. We also realize a proto-object concept based on segmenting, tracking and sonifying multiple parts of objects separately so that users can understand the shape of objects as the assembly of their sonified parts. ...
... Moreover, that system was tested using artificially generated 3D objects on the problem of object recognition. In contrast, although using a simple sonification scheme, mapping object size or distance to frequency, the earlier incarnation of our system [4] was tested in real scenarios, on the problem of object localization. Conversely, the current paper proposes a scheme for sonifying the interior shape of an object, and compares it to the outline contour approach implemented previously [5]; moreover, here we also introduce new object tracking capabilities to adapt the approaches to real physical scenarios on both tasks -object localization and recognition. ...
Conference Paper
Full-text available
In this paper, we present a new approach to real-time tracking and sonification of 3D object shapes and test the ability of blindfolded participants to learn to locate and recognize objects using our system in a controlled physical environment. In our sonification and sensory substitution system, a depth camera accesses the 3D structure of objects in the form of point clouds and objects are presented to users as spatial audio in real time. We introduce a novel object tracking scheme, which allows the system to be used in the wild, and a method for sonification of objects which encodes the internal 3D contour of objects. We compare the new sonfication method with our previous object-outline based approach. We use an ABA/BAB experimental protocol variant to test the effect of learning during training and testing and to control for order effects with a small group of participants. Results show that our system allows object recognition and localization with short learning time and similar performance between the two sonification methods.
... Recent work in utilizing depth cameras for sensory substitution promises to increase the usefulness of such visualto-audio sensory substitution systems [2] [3]. Mhaish et al.'s system [2] uses a 3D depth camera to create point clouds characterizing the surfaces of objects in a scene and presents those surfaces to a user using spatial audio. ...
... Recent work in utilizing depth cameras for sensory substitution promises to increase the usefulness of such visualto-audio sensory substitution systems [2] [3]. Mhaish et al.'s system [2] uses a 3D depth camera to create point clouds characterizing the surfaces of objects in a scene and presents those surfaces to a user using spatial audio. See Figure 1 for a summary of the information flow in that approach. ...
... As discussed above choosing a good approach to sonification, plays an important role in achieving good performance of low-level sonification systems. Therefore in the current work, two different sound generation methods are provided for Mhaish et al.'s [2] system and their accuracies are measured on the task of synthetic 3D object identification. The idea of using synthetic objects instead of real objects is to evaluate the performance of different sound generation methods isolated from the performance of other components and environmental noise. ...
Conference Paper
Full-text available
An empirical investigation is presented of different approaches to sonification of 3D objects as a part of a sensory substitution system. The system takes 3D point clouds of objects obtained from a depth camera and presents them to a user as spatial audio. Two approaches to shape sonification are presented and their characteristics investigated. The first approach directly encodes the contours belonging to the object in the image as sound waveforms. The second approach categorizes the object according to its 3D surface properties as encapsulated in the rotation invariant Fast Point Feature Histogram (FPFH), and each category is represented by a different synthesized musical instrument. Object identification experiments are done with human users to evaluate the ability of each encoding to transmit object identity to a user. Each of these approaches has their disadvantages. Although the FPFH approach is more invariant to object pose and contains more information about the object, it lacks generality because of the intermediate recognition step. On the other, since contour based approach has no information about depth and curvature of objects, it fails in identifying different objects with similar silhouettes. On the task of distinguishing between 10 different 3D shapes, the FPFH approach produced more accurate responses. However, the fact that it is a direct encoding means that the contour-based approach is more likely to scale up to a wider variety of shapes.
Article
Full-text available
This paper presents a new prototype for being used as a travel aid for blind people. The system is developed to complement traditional navigation systems such as white cane and guide dogs. The system consists of two stereo cameras and a portable computer for processing the environmental information. The aim of the system is to detect the static and dynamic objects from the surrounding environment and transform them into acoustical signals. Through stereophonic headphones, the user perceives the acoustic image of the environment, the volume of the objects, moving object direction and trajectory, its distance relative to the user and the free paths in a range of 5m to 15m. The acoustic signals represent short train of delta sounds externalized with non-individual Head-Related Transfer Functions generated in an anechoic chamber. Experimental results show that users were able to control and navigate with the system safety both in familiar and unfamiliar environments.
Conference Paper
Full-text available
With the advent of new, low-cost 3D sensing hardware such as the Kinect, and continued efforts in advanced point cloud processing, 3D perception gains more and more importance in robotics, as well as other fields. In this paper we present one of our most recent initiatives in the areas of point cloud perception: PCL (Point Cloud Library - http://pointclouds.org). PCL presents an advanced and extensive approach to the subject of 3D perception, and it's meant to provide support for all the common 3D building blocks that applications need. The library contains state-of- the art algorithms for: filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. PCL is supported by an international community of robotics and perception researchers. We provide a brief walkthrough of PCL including its algorithmic capabilities and implementation strategies.
Article
Full-text available
We investigated to what extent participants can acquire the mastery of an auditory-substitution-of-vision device ('The vOICe') using dynamic tasks in a three-dimensional environment. After extensive training, participants took part in four experiments. In the first experiment we explored locomotion and localisation abilities. Participants, blindfolded and equipped with the device, had to localise a target by moving a hand-held camera, walk towards the target, and point at it. In the second experiment, we studied the localisation ability in a constrained pointing task. In the third experiment we explored participants' ability to recognise natural objects via their auditory rendering. In the fourth experiment we tested the ability of participants to discriminate objects belonging to the same category. We analysed participants' performance from both an objective and a subjective point of view. The results showed that participants, through sensorimotor interactions with the perceptual scene while using the hand-held camera, were able to make use of the auditory stimulation to obtain the information necessary for locomotor guidance, localisation, and pointing, as well as for object recognition. Furthermore, analysis from a subjective perspective yielded insights into participants' qualitative experience and into the strategies they used to master the device, and thus to pass from a kind of deductive reasoning to a form of immediate apprehension of what is being perceived.
Article
In this paper we present two real-time methods for estimating surface normals from organized point cloud data. The proposed algorithms use integral images to perform highly efficient border- and depth-dependent smoothing and covariance estimation. We show that this approach makes it possible to obtain robust surface normals from large point clouds at high frame rates and therefore, can be used in real-time computer vision algorithms that make use of Kinect-like data.
Article
A system for converting an optical image into a tactile display has been evaluated to see what promise it has as a visual substitution system. After surprisingly little training, Ss are able to recognize common objects and to describe their arrangement in three-dimensional space. When given control of the sensing and imaging device, a television camera, Ss quickly achieve external subjective localization of the percepts. Limitations of the system thus far appear to be more a function of display resolution than limitations of the skin as a receptor surface. The acquisition of skill with the device has been remarkably similar for blind and sighted Ss.
Article
Recent advances in the instrumentation technology of sensory substitution have presented new opportunities to develop systems for compensation of sensory loss. In sensory substitution (e.g. of sight or vestibular function), information from an artificial receptor is coupled to the brain via a human-machine interface. The brain is able to use this information in place of that usually transmitted from an intact sense organ. Both auditory and tactile systems show promise for practical sensory substitution interface sites. This research provides experimental tools for examining brain plasticity and has implications for perceptual and cognition studies more generally.