Conference PaperPDF Available

EdgeSonic: Image feature sonification for the visually impaired

Authors:

Abstract and Figures

We propose a framework to aid a visually impaired user to recognize objects in an image by sonifying image edge features and distance-to-edge maps. Visually impaired people usually touch objects to recognize their shape. However, it is difficult to recognize objects printed on flat surfaces or objects that can only be viewed from a distance, solely with our haptic senses. Our ultimate goal is to aid a visually impaired user to recognize basic object shapes, by transposing them to aural information. Our proposed method provides two types of image sonification: (1) local edge gradient sonification and (2) sonification of the distance to the closest image edge. Our method was implemented on a touch-panel mobile device, which allows the user to aurally explore image context by sliding his finger across the image on the touch screen. Preliminary experiments show that the combination of local edge gradient sonification and distance-to-edge sonification are effective for understanding basic line drawings. Furthermore, our tests show a significant improvement in image understanding with the introduction of proper user training.
Content may be subject to copyright.
EdgeSonic:
Image Feature Sonification for the Visually Impaired
Tsubasa Yoshida
UEC Tokyo
Tokyo, Japan
tsubasa@vogue.is.uec.ac.jp
Kris M. Kitani
UEC Tokyo
Tokyo, Japan
kitani@is.uec.ac.jp
Hideki Koike
UEC Tokyo
Tokyo, Japan
koike@is.uec.ac.jp
Serge Belongie
UCSD
San Diego, CA, USA
sjb@cs.ucsd.edu
Kevin Schlei
UW-Milwaukee
Milwaukee, WI, USA
kevinschlei@gmail.com
ABSTRACT
We propose a framework to aid a visually impaired user to
recognize objects in an image by sonifying image edge fea-
tures and distance-to-edge maps. Visually impaired people
usually touch objects to recognize their shape. However, it
is difficult to recognize objects printed on flat surfaces or ob-
jects that can only be viewed from a distance, solely with our
haptic senses. Our ultimate goal is to aid a visually impaired
user to recognize basic object shapes, by transposing them
to aural information. Our proposed method provides two
types of image sonification: (1) local edge gradient sonifica-
tion and (2) sonification of the distance to the closest image
edge. Our method was implemented on a touch-panel mo-
bile device, which allows the user to aurally explore image
context by sliding his finger across the image on the touch
screen. Preliminary experiments show that the combination
of local edge gradient sonification and distance-to-edge soni-
fication are effective for understanding basic line drawings.
Furthermore, our tests show a significant improvement in
image understanding with the introduction of proper user
training.
Categories and Subject Descriptors
H.5.2 [Information interfaces and presentation]: User
Interfaces-Auditory (non-speech) feedback
General Terms
Human Factors, Design, Experimentation
Keywords
Image sonification, sensory substitution, visually impaired,
edge detection
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Augmented Human Conference ’11 Odaiba Tokyo
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
Figure 1: Mapping from image features to sound
1. INTRODUCTION
The visually impaired leverage auditory and haptic senses
(among other senses) to recognize the world around them.
However, objects displayed on flat surfaces (e.g. posters, dig-
ital displays, labels) and objects at a distance (e.g. buildings,
landscape, billboards) are harder to perceive. How then can
we use technology to translate these types of flat and distant
visual information in such a way to aid the visually impaired
to perceive them?
Devices such as the OptiCon scanner that displays printed
documents on a haptic display, 2D pin arrays and device spe-
cific sonification (e.g. digital temperature reader for ovens)
have been developed for the visually impaired but can be
very costly (e.g. several thousand dollars). Therefore, we
aim to develop a framework that is more accessible and af-
fordable to more people.
With the evolution and wide spread use of mobile de-
vices, it is fair to say that many people now have access to
a lightweight camera, sufficient computing power and audio
playback. In this work, we explore the use of image capture,
basic computer vision algorithms and audio feedback sup-
ported by existing mobile platforms to aid visually impaired
users in accessing visual information.
In our proposed framework we leverage basic image pro-
cessing techniques to extract salient features from an image
and then transpose them into sound. In particular, we ex-
tract image edges (regions of high visual contrast) and map
them to a combination of timed frequency oscillators (Figure
1). Our prototype system allows a users to take a picture of
the visual world and explore the static image aurally via a
mobile touch-screen device. Our preliminary tests show that
with proper training, our system can be used to understand
basic shapes and patterns in under 90 seconds.
2. RELATED WORK
Previous work on image sonification can be roughly di-
vided into two types of sonification. In high-level (sym-
bolic) sonification, visual information is translated into nat-
ural language. In contrast, low-level sonification transposes
visual information into an abstract audio signal. Our pro-
posed approach falls into the latter category of low-level im-
age sonification.
2.1 High-level sonification
The majority of work on sonification for the blind has fo-
cused on high-level (symbolic) sonification. Text-to-speech
(TTS) is the most well known sonification system, where
such software as the VoiceOver function on Apple products
and JAWS (Freedom Scientific, Inc.) can sonify text char-
acters and objects displayed by a computer. The advantage
of such systems is that they map visual information to the
information-rich realm of the natural language. The obvi-
ous limitation of high-level sonification is that it is limited
to objects that have obvious semantic representations. For
example, it is not clear how to sonify complex shapes, color
variations and detailed textures.
LookTel[1] goes beyond TTS and implements computer vi-
sion algorithms to automatically recognize object categories
and aurally returns the name of an object. Although train-
ing the classifiers requires a potentially extensive training
process, virtually no effort is required of the users to utilize
the system.
The VizWiz [3] mobile phone application allows a visually
impaired user to tap into the power of crowdsourcing to
obtain answers to visual queries. The system combines the
power of TTS and a pool of remote sighted guides to aid the
visually impaired user. The main advantage of the system
is that it leverages the brain power of humans and therefore
can deal with a large range of complex queries. The lag time
between queries and answers, the running cost of queries and
the availability of remote sighted users is still an open issue.
2.2 Low-level sonification
While high-level sonification eases the burden of recog-
nition, it is also limited by the lexicon of the system. In
contrast, a mapping to an abstract audio space (low-level
sonification) has the advantage of dealing with a wider range
of objects without being constrained by a lexicon. Low-level
sonification can still work with hard-to-label (untrained) ob-
jects and can work in realtime without relying on remote
guides.
The vOICe system [4] sonifies the global luminance of an
image and maps luminance values to a mixture of frequency
oscillators. Specifically, the image brightness is mapped
to amplitude and location is mapped to a frequency. The
vOICe system scans the entire field of view of a head mounted
camera with a vertical bar from left to right and transposes
the luminance over the vertical bar to sound. One of the
advantages of the vOICe system is that it sonifies an entire
image to convey the global content of any type of scene and
Figure 2: (a)source image (b)edge image (c)distance
image
the system does not require any type of prior training or
lexicon.
The Timbremap [5] system sonifies local visual informa-
tion base on the location of a users finger on a map. Tim-
bremap helps the user to navigate through a map by soni-
fying distances to lines (streets) on the map. By placing his
finger on a map, the position of the finger with respect to
the nearest line is transposed into an audio signal. The sys-
tem uses stereo panning to convey whether the finger is to
the right or left of the line. Although the Timbremap was
designed for maps, the concept of binaural feedback can be
applied to any type of line drawing.
Ivan and Radek[2] presented a sonification method for
mapping color information to a frequency oscillator, where
color information was mapped to the wave envelope, wave-
form and frequency.
A common attribute of low-level image sonification is that
it requires the user to learn the mapping between visual fea-
tures and audio feedback. While it does place a greater
burden on the user, it also taps into the human potential for
sensory substitution and uses a diverse set of cognitive skills.
We hypothesize that with the proper training, low-level soni-
fication techniques offer users more depth and breadth in
analyzing audio-visual information.
3. OUR APPROACH
The use of our fingers is an intuitive way to explore local
texture and shape. For example, we can feel the grains on
a plank of wood or follow a crease in a piece of paper. For
a visually impaired person trained to read Braille, the fin-
gertips are used as a type of local area sensor to understand
Braille dot patterns or raised figures. In a similar manner,
we aim to extend the analogy of the finger as a local area
sensor to provide an intuitive mode of obtaining local spa-
tial information. When a finger touches an edge, local area
sonification occurs. From initial experiments, we found that
local area sonification was only effective when a finger was
located near an edge in the image. When no edge existed in
the vicinity of the finger, the user would wander the image
and lose his sense of positioning relative to an edge. To aid
the user in areas with no edge features, we incorporated a
secondary sonification mode, distance-to-edge sonification,
that conveys the distance to the nearest edge. We give a
detailed explanation of these two modes of sonification in
the following sections.
3.1 Sonifying the Edge Image
As mentioned above, we begin our exploration of image
sonification with the use of image edge features. In visual
perception, is has been shown that people often use the im-
age edges (object outlines) to recognize the object [6] [7].
Therefore, in this work, we explore the use of image edge
sonification. To extract edge features from the image, we
use the Canny edge detection [8] to obtain contours. The
Canny edge detector is relatively robust to noise and two
thresholding parameters can be adjusted to extract only the
dominant edges in an image (see Figure 2(b)). Although we
have utilized the Canny detector for simplicity, we believe
that more sophisticated contour detectors such as gPb [10]
will improve the quality of the extracted edge map. This
edge image is used to generate the audio feedback for local
area sonification.
When a finger is placed at a pixel location isuch that
it is part of an edge iE, a small vertical bar scans the
image directly under the finger (see Figure 1). Each element
of the vertical bar is associated to a frequency oscillator (a
simple sine wave for our experiments), where each element
is mapped on an exponential scale over a range between
fmax = 2527Hz (top element) and fmin = 440H z (bottom
element). As the vertical bar scans the local area in the
binary edge image, a certain frequency oscillator is turned on
if the pixel that it scans is an edge, otherwise the oscillator
is turned off. A horizontal line in the edge image yields long
lasting single frequency sine wave. A vertical line yields a
single bleep sound, where all frequency oscillators are turned
on simultaneously for a short duration.
We implemented our image sonification system on a mo-
bile touch-screen display device, the Apple iPhone 3G. The
size of the square area scanned by the local area sonification
mode is 30 ×30 pixels, roughly the size of the finger tip.
Since edges are usually very thin and very hard to localize
with the fingertip, the edge image was dilated and smoothed
to increase the width of the edges.
3.2 Sonifying the Distance-to-Edge Map
In addition to the edge image, we calculated the shortest
distance to the nearest edge for use with distance-to-edge
sonification. Each element jin the distance-to-edge map
contains the Euclidean distance to the nearest edge (some
pixel location in the image). The resulting distance-to-edge
map generated using the Felzenszwalb algorithm [9] is shown
in Figure 2(c). The distance map is used to generate a pulse
train to convey to the user the distance d(j) to the nearest
edge, where the mapping from distance to frequency is given
as below.
f(j) = (fHfL)255 d(j)
255 2
+fL(1)
where fHis the highest frequency of the pulse train and fLis
the lowest output frequency of the pulse train. We normalize
by 255 because the maximum value of the distance image has
been scaled to 255.
As the user slides his finger closer to an edge, the fre-
quency of the pulse train increases and reaches a maximum
when the position is 10 pixels away from an edge. The pulse
train ceases to play once the user’s finger is within 10 pixels
of an edge.
4. LOCAL AREA SONIFICATION
To understand the effect of local area sonification on image
understanding, we performed an experiment where a user is
asked to reproduce three line drawings shown in Figure 3. In
the first part of the experiment, the local area sonification
(a) straight line (b) saw wave shape (c) sine wave shape
Figure 3: Ground truth for localized patterns
Local Sonification OFF Local Sonification ON
Participant 1
Participant 2
Participant 3
Participant 4
Participant 5
Participant 6
Figure 4: User results: Local area sonification OFF
(left) and Local area sonification ON(right)
is turned off and the user relies solely on the distance-to-
edge sonification to reproduce the line drawing. Then in the
second part of the experiment, the user reproduces the line
drawings using both the local area sonification and distance-
to-edge sonification. The users where only given a simple
verbal explanation of the sonification and no training was
administered to the participants. Each participant was given
60 seconds to reproduce the line drawing.
In Figure 4 we observe that four out of the six participants
were able to identify the locally periodic patterns generated
by the sine wave. Notice that the gradients of line drawings
of all participants changed after including local area sonifi-
cation.
Ground truth Ground truth
BEFORE training AFTER training
Participant 1
Participant 2
Participant 3
Participant 4
Figure 5: User results: Before training (left), After
training (right)
5. USER TRAINING
In this second experiment, we tested the effect of train-
ing. The line drawing produced by the participants before
and after training are given in Figure 5. Participants were
only given 90 seconds to reproduce the line drawings for
both tests. Before training, none of the participants were
able to reproduce any of the line drawing. For the training
stage, each user was given roughly 20 minutes to explore
various basic shapes (with prior knowledge of the line draw-
ing). The moderator also quizzed participants to localize
slopes, corners and t-junctions. After training, we observe
a significant increase in performance. Two of the four par-
ticipants were able to correctly reproduce all line drawings.
All participants were able to reproduce the triangle. We
also note that the degree to which the reproductions differ
from the ground truth line drawings was also significantly
reduced after training. Although some of the reproductions
are incomplete, none of the participants generated lines with
gradients that contradicted the ground truth line drawings.
6. DISCUSSION AND CONCLUSION
Even with the use of distance-to-edge sonification, many
participants commented that it was difficult to track ab-
solute position in the current framework. We would expect
that a hybrid sonification scheme using both local and global
sonification may help to alleviate this problem.
Although all user tests were performed with blindfolded
Figure 6: Test participant using the EdgeSonic sys-
tem
sighted participants, we are planning to evaluate our system
with people with congenital blindness and those who have
lost their sight later in life. We noticed during our experi-
ments that participants were highly influenced by their prior
knowledge of the visual world and rudimentary shapes (e.g.
perceiving a triangle as a circle). It will be interesting to ob-
serve how this phenomenon comes into play for the visually
impaired.
In this paper, we presented a sonification methodology
based on edge gradients and distance-to-edge maps. Prelim-
inary experiments with blindfolded sighted persons showed
that local area sonification enabled participants to be more
sensitive to changes in the local gradients in images. In
addition, experiments showed significant improvements in
the participant’s ability to reproduce the line gradients in
line drawings after a period of training. Future work will
focus on better training techniques and hybrid sonification
schemes to increase recognition speed.
7. REFERENCES
[1] J. Sudol, O. Dialameh, C. Blanchard and T. Dorcey. LookTel: A
Comprehensive Platform for Computer-Aided Visual Assistance.
In Proceedings of the Workshop on Computer Vision
Applications for the Visually Impaired, 2008.
[2] K. Ivan and O. Radek. Hybrid Approach to Sonification of Color
Images, In Proceedings of the International Conference on
Convergence and Hybrid Information Technology, 2008.
[3] J.P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R.C. Miller,
A. Tatarowicz, B. White, S. White and T. Yeh. VizWiz: Nearly
Real-time Answers to Visual Questions. In Proceedings of the
Symposium on User Interface Software and Technology, 2010.
[4] P.B.L. Meijer. An Experimental System for Auditory Image
Representations. In IEEE Transactions on Biomedical
Engineering, 1993.
[5] J. Su, A. Rosenzweig, A. Goel, E. de Lara and K.N. Truong.
Enabling the Visually-Impaired to Use Maps on Touch-Enabled
Devices. In Proceedings of MOBILECHI, 2010.
[6] R. L. Gregory. Cognitive Contours. In Nature, vol 238,
pp.51-52, 1972.
[7] I. Ro ck and R. Anson. Illusory Contours as the Solution to a
Problem. In Perception, vol.8, pp.665-681, 1979.
[8] J. Canny. A Computational Approach to Edge Detection. In
IEEE Transactions on Pattern Analysis and Machine
Intelligence., 1986.
[9] P.F. Felzenszwalb and D.P. Huttenlocher. Distance Transforms
of Sampled Functions. In Cornell Computing and Information
Science Technical Report, 2004.
[10] M. Maire, P. Arbel´aez, C. Fowlkes and J. Malik. Using
Contours to Detect and Localize Junctions in Natural Images.
In Proceedings of the Conference on Computer Vision and
Pattern Recognition, 2008.
... Sonification has been used to convert various dynamic data sets into sound and influence user behavior or perception, from climate data (George et al., 2017), distance (Hussain et al., 2014), to vehicle state (Landry et al., 2016). Various mapping strategies have been explored (Dubus & Bresin, 2013), the sonification of static images has mostly focused on informing blind and visually impaired individuals of their surroundings or nearby objects (Cavaco et al., 2013;Yoshida et al., 2011). ...
... The two art styles were selected as opposing and significantly different aesthetic styles and hypothesized to require different sonification approaches. Each art style was also hypothesized to mirror past applications of sonification, towards photographs with feature extraction (Yoshida et al., 2011) for realist art, and towards artistic endeavors in dynamic situations for abstract art (Gayhardt & Ackerman, 2021;Micheloni et al., 2017). Additionally, we sought to include these two art styles as case studies to identify what parameters beyond color variation should be considered. ...
... The algorithm scanned the image from left to right in a basic manner, musical notes were created in an unstructured manner (i.e., not like a traditional music piece with a specific genre). The aim was to incite further comments and suggestions from experts by presenting a rudimentary model with no biasing towards approaches explored in past research (Cavaco et al., 2013;Dubus & Bresin, 2013;Gayhardt & Ackerman, 2021;Yoshida et al., 2011). ...
Article
Advances in the fields of data processing and sonification have been applied to transcribe a variety of visual experiences into an auditory format. Although image sonification examples exist, the application of these principles to visual art has not been examined thoroughly. We sought to develop and evaluate a set of guidelines for the sonification of visual artworks. Through conducting expert interviews (N = 11), we created an initial sonification algorithm that accounts for art style, lightness, and color diversity to modulate the sonified output in terms of tempo and pitch. This algorithm was evaluated through user evaluations (N = 22). User study responses supported expert interview findings, the notion that sonification can be designed to match the experience of viewing an artwork, and showed interesting interaction effects among art styles, visual components, and musical parameters. We suggest the proposed guidelines can augment visitor experiences at art exhibits and provide the basis for further experimentation.
... More sonification-based studies were presented by Yoshida et al. (11) and Krishnan et al. (6) They proposed a framework to assist visually impaired users in recognizing an object in an image according to image edge features and distance-to-edge maps by transforming basic object shapes into sounds. The system was implemented on the touch screen of a mobile device, allowing users to explore the image content by moving their fingers over the screen. ...
... Many systems have been developed to help visually impaired people perceive information of images on a screen. (6,(9)(10)(11)(12)(13) Some previous systems also implemented a deep learning methodology; (4,13,14,19) however, none of them used deep learning to detect objects in images. Therefore, the novelty of our system is that a deep learning method is implemented for object detection, then the sound or word of the object is presented to the user. ...
Article
Along with the invention of portable devices, such as smartphones and tablets, computer-based touch screen sensor-assistive technologies have become significantly more affordable than traditional tactile graphics. The sensor panel in these technologies allows users to receive visual and auditorial responses via interaction with the device. However, visually impaired individuals (with a lack or loss of ability to see) will not find visual responses useful when using tablets or smartphones. Therefore, in this paper we propose a system that helps visually impaired people comprehend information on electronic devices with the help of auditory action feedback. We develop a multimedia system for sound production from a given image via object detection. In this study, YOLO (You Only Look Once) is used in object detection for sonification. A pre-trained model is used; thus, a wider range of object classification can be identified. The system generates the corresponding sound when an object on the sensor screen is touched. The purpose of our research is to aid visually impaired people to perceive information of a picture shown on the device by touching the detected object. The device was tested by simulating visually impaired people by blindfolding people with normal vision, who filled out questionnaires on its performance. The results indicate that most of the users found that the sound presented by the device was helpful for telling them what the shown image was.
... There are non-direct methods of image to music conversion as well, such as Musical Vision [25] and EdgeSonic [26], where the focus is on the time series generated according to how human eye scans various parts of an image. Other notable methods include image sonification [27]. ...
Conference Paper
Full-text available
We propose a method for generating music from a given image through three stages of translation, from image to caption, caption to lyrics, and lyrics to instrumental music, which forms the content to be combined with a given style. We train our proposed model, which we call BGT (BLIP-GPT2-TeleMelody), on two open-source datasets, one containing over 200,000 labeled images, and another containing more than 175,000 MIDI music files. In contrast with pixel level translation, our system retains the semantics of the input image. We verify our claim through a user study in which participants were asked to match input images with generated music without access to the intermediate caption and lyrics. The results show that, while the matching rate among participants with little music expertise is essentially random, the rate among those with composition experience is significantly high, which strongly indicates that some semantic content of the input image is retained in the generated music. The source code is avaliable at https://github.com/BILLXZY1215/BGT-G2G.
... Simple conversion methods were dedicated to map different characteristics of the image to sound. The brightness of the pixel is converted to the volume of generated sound [37] or musical notes [20], texture pattern into a periodic signal [19], colors to the wave envelope, waveform and frequency of a sound of an oscillator [14], edge to frequency of a sound of an oscillator [38]. More sophisticated methods were developed like the one in [29] which exploit the richness of the color by mapping HSV color space to different characteristics of the sound: Hue (H) to the fundamental frequency, Saturation (S) to signal's spectral envelope, Value (V) to the loudness of the synthesized sound. ...
Article
Full-text available
This paper presents an image-audio dataset and a mid-level image sonification system that strives to help visually impaired users understand the semantic content of an image and access visual information via a combination of semantic audio and an easily decodable audio generated in real time, both triggered by sliding, taping, holding actions when the users explore the image on a touch screen or with a pointer. Firstly, we segmented the original image using a label fusion model and based on the user position in the image, a sonified signal is generated using musical notes and meaningful visual information within the active region like the color and the luminance, then the gradient and the texture. Secondly, we integrated the semantic understanding of the image into our model using DeepLab semantic segmentation of the image and created a dataset of audio and images aligned on the 20 classes of the PASCAL VOC 2012 dataset. The dataset of images are organized based on color, gradient, texture for low-level sonification and on semantic content with sounds for mid-level sonification. Thirdly, in order to provide both types of information in a complementary way, the slide, tap and hold actions of a touch screen are incorporated in the model. The semantic audio providing a brief description of the visual object is played on slide action, the generated signal with color details of the object on the tap action, gradient and texture of the object on hold action. Finally, we validated our sonification model on the provided dataset during a pilot study and the subjects were generally able to identify the objects in the image, the color of the objects and even provide a general description of the scene of the image. Our system could be useful to visually impaired persons in a photo sharing application using a smartphone or for painting art description in a digital museum.
... Simple conversion methods were dedicated to map different characteristics of the image to sound. The brightness of the pixel is converted to the volume of generated sound [24] or musical notes [25], texture pattern into a periodic signal [26], colors to the wave envelope, waveform and frequency of a sound of an oscillator [27], edge to frequency of a sound of an oscillator [28]. More sophisticated methods were developed like the one in [29] which exploit the richness of the color by mapping HSV color space to different characteristics of the sound: Hue (H) to the fundamental frequency, Saturation (S) to signal's spectral envelope, Value (V) to the loudness of the synthesized sound. ...
Preprint
This paper presents an image-audio dataset and a mid-level image sonification system that strives to help visually impaired users understand the semantic content of an image and access visual information via a combination of semantic audio and an easily decodable audio generated in real time, both triggered by sliding, taping, holding actions when the users explore the image on a touch screen or with a pointer. We segmented the original image using a label fusion model and based on the user position in the image, a sonified signal based on musical notes is generated using meaningful visual information within the active region like the color and the luminance at first, then the gradient, and the texture finally. We integrated the semantic understanding of the image into our model using DeepLab semantic segmentation of the image and created a dataset of audio and images aligned on the 20 classes of the PASCAL VOC 2012 dataset. The dataset images are organized based on color, gradient, texture for low-level sonification and on semantic content with sounds for mid-level sonification. In order to provide both types of information in a complementary way, the slide, tap and hold actions of a touch screen are incorporated in the model. The semantic audio providing a brief description of the visual object is played on slide action, the generated signal with color details of the object on the tap action, gradient and texture of the object on hold action.
... At the same time, audio-based graphical information exploration has also been investigated. Yoshida et al. [135] proposed a framework for users with VI to recognize objects in an image by sonifying image edge features and distance-to-edge maps. They proposed two types of sonification: ...
Thesis
Les graphiques sont utilisés comme un outil puissant pour présenter des informations. Alors que les informations graphiques deviennent de plus en plus omniprésentes dans le travail comme dans la vie quotidienne, il est important pour les personnes ayant une déficience visuelle (VI) de pouvoir les explorer et les comprendre. Cependant, l'accès aux cartes, schémas, graphiques mathématiques, dessins, etc. est toujours un grand défi pour les personnes déficientes visuelles. Généralement, le processus d'adaptation des graphiques tactiles repose sur des méthodes maîtrisées par les documentaristes tactiles et donc difficilement réalisables en grande série. De plus, les professionnels ne savent toujours pas très bien comment les personnes déficientes visuelles organisent leurs mains lors de l'exploration de graphiques tactiles. Cependant, ces informations peuvent être utiles aux professionnels pour améliorer l'accessibilité des graphiques. Pour le premier problème, avec les développements récents des technologies informatiques, de nombreux systèmes interactifs ont été proposés. Parmi eux, certains sont des systèmes hybrides qui combinent des composants physiques et numériques à la fois tandis que d'autres sont entièrement numériques, dont les informations graphiques sont présentées directement sur des appareils numériques. Par rapport aux graphiques tactiles, bien que les graphiques numériques puissent offrir plus de flexibilité (facile à modifier), leur exploration est encore très limitée en raison du manque d'indices tactiles. Dans cette thèse, nous avons étudié la possibilité d'améliorer l'expérience d'exploration tactile des graphiques numériques en concevant une interface vibrotactile sur la main appelée VibHand, qui permet aux personnes déficientes visuelles d'explorer plus facilement les graphiques numériques sur des tablettes. Pour ce faire, nous avons étendu l'idée d'utiliser la vibration de la tablette et étudié l'utilisation d'un affichage vibrotactile sur le dos de la main pour transmettre des informations de direction et de progression. Pour le deuxième problème, des recherches antérieures ont confirmé que la conception de graphiques tactiles doit généralement prendre en compte non seulement des éléments graphiques, mais également les utilisateurs ou les tâches cibles. Cependant, il manque une méthode efficace pour évaluer le comportement d'exploration tactile des utilisateurs ainsi que leurs capacités perceptives et cognitives tout en explorant les graphiques tactiles à partir d'un niveau plus raffiné. Par conséquent, bien que nous ayons déjà observé certains comportements exploratoires organisés spéciaux, ils n'ont jamais été systématiquement discutés et évalués. Pour résoudre ce problème, nous avons mené des recherches correspondantes axées sur la compréhension de l'exploration tactile des personnes déficientes visuelles et avons proposé un marqueur comportemental (une nouvelle observation comportementale) appelé "fixation tactile". Une fixation tactile se produit lorsqu'un doigt est immobile dans une fenêtre spatiale et temporelle lors de l'exploration tactile. L'identification des fixations tactiles peut fournir des informations précieuses sur les zones saillantes des graphiques et les stratégies d'exploration des personnes déficientes visuelles. Outre les deux principaux axes mentionnés ci-dessus, dans cette thèse, nous avons également exploré la possibilité d'un apprentissage graphique collaboratif à distance qui avait une demande explosive pendant la pandémie mondiale. Le système proposé, qui s'appelle TactileLink, était basé sur un outil d'exploration graphique interactif et co-conçu avec plusieurs professionnels pour les personnes déficientes visuelles. Dans l'ensemble, cette thèse apporte des connaissances à la fois théoriques et applicatives à l'exploration tactile non visuelle des graphiques par les personnes déficientes visuelles.
Chapter
Centered on three themes, this book explores the latest research in plasticity in sensory systems, focusing on visual and auditory systems. It covers a breadth of recent scientific study within the field including research on healthy systems and diseased models of sensory processing. Topics include visual and visuomotor learning, models of how the brain codes visual information, sensory adaptations in vision and hearing as a result of partial or complete visual loss in childhood, plasticity in the adult visual system, and plasticity across the senses, as well as new techniques in vision recovery, rehabilitation, and sensory substitution of other senses when one sense is lost. This unique edited volume, the fruit of an International Conference on Plastic Vision held at York University, Toronto, will provide students and scientists with an overview of the ongoing research related to sensory plasticity and perspectives on the direction of future work in the field.
Conference Paper
I designed two sonification platforms designed for visual/auditory communication design studies and audiovisual art. The purpose of this study was to examine whether test participants can associate visuals and sound without any prior training and sonification approaches in this paper can be utilized as an interactive musical expression. The platform for the communication design study was developed first and the artistic audiovisual platform with the same sonification methodology followed next. In this paper, I introduce the (former) sonification platform designed for the image-to-sound association studies, their sonification methodologies, and present the study results. The object-oriented sonification method that I newly developed describes each shape sonically. The five image-sound association studies were conducted to see whether people can successfully associate sounds and fundamental shapes (i.e., a circle, a triangle, a square, lines, curves, and other custom shapes). Regardless of age and educational background, the correct answer rate was high.
Article
Full-text available
This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.
Conference Paper
Full-text available
We present an extensible platform that integrates state of the art computer vision techniques with mobile communications to deliver a portable visual assistance tool. Live input video from a mobile smartphone is streamed over a 3G or wireless connection while an object recognition engine on a desktop processes the data stream. Recognition results are returned in real-time to the mobile device and announced by a text-to-speech engine. The system design is complete and includes the ability to add new items, share databases, and provide live remote human sighted assistance.
Conference Paper
Full-text available
Contours and junctions are important cues for perceptual organization and shape recognition. Detecting junctions locally has proved problematic because the image intensity surface is confusing in the neighborhood of a junction. Edge detectors also do not perform well near junctions. Current leading approaches to junction detection, such as the Harris operator, are based on 2D variation in the intensity signal. However, a drawback of this strategy is that it confuses textured regions with junctions. We believe that the right approach to junction detection should take advantage of the contours that are incident at a junction; contours themselves can be detected by processes that use more global approaches. In this paper, we develop a new high-performance contour detector using a combination of local and global cues. This contour detector provides the best performance to date (F=0.70) on the Berkeley Segmentation Dataset (BSDS) benchmark. From the resulting contours, we detect and localize candidate junctions, taking into account both contour salience and geometric configuration. We show that improvements in our contour model lead to better junctions. Our contour and junction detectors both provide state of the art performance.
Conference Paper
Full-text available
Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to interpret such visual information is a nuisance for blind people who often have effective, if inefficient, work-arounds to overcome them. Collectively, however, they can make blind people less independent. Specialized technology addresses some problems in this space, but automatic approaches cannot yet answer the vast majority of visual questions that blind people may have. VizWiz addresses this shortcoming by using the Internet connections and cameras on existing smartphones to connect blind people and their questions to remote paid workers' answers. VizWiz is designed to have low latency and low cost, making it both competitive with expensive automatic solutions and much more versatile.
Article
Full-text available
An experimental system for the conversion of images into sound patterns was designed to provide auditory image representations within some of the known limitations of the human hearing systems possibly as a step towards the development of a vision substitution device for the blind. The application of an invertible (one-to-one) image-to-sound mapping ensures the preservation of visual information. The system implementation involves a pipelined special-purpose computer connected to a standard television camera. A novel design and the use of standard components have made for a low-cost portable prototype conversion system with a power dissipation suitable for battery operation. Computerized sampling of the system output and subsequent calculation of the approximate inverse (sound-to-image) mapping provided the first convincing experimental evidence for the preservation of visual information in sound representations of complicated images.
Article
This paper deals with the accessibility of graphics for visually impaired people. It presents a novel method of sonification of complex graphical objects, such as color photographs, based on a hybrid approach combining sound and speech communication. This approach is supported by a special color model, called semantic color model, which is introduced in the paper. The semantic color model possesses suitable properties that can be used to deliver the relevant graphical information in sound or speech in a convenient form. The integration of this approach with the annotated SVG format developed within the ongoing GATE project, which is also briefly described in the paper, enhances the efficiency of the system.
Conference Paper
Mapping applications on mobile devices have gained widespread popularity as a means for enhancing user mobility and ability to explore new locations and venues. Visually impaired users currently rely on computer text-to-speech or human-spoken descriptions of maps and indoor spaces. Unfortunately, speech-based descriptions are limited in their ability to succinctly convey complex layouts or spacial positioning. This paper presents Timbremap, a sonification interface enabling visually impaired users to explore complex indoor layouts using off-the-shelf touch-screen mobile devices. This is achieved using audio feedback to guide the user's finger on the device's touch interface to convey geometry. Our user-study evaluation shows Timbremap is effective in conveying non-trivial geometry and enabling visually impaired users to explore indoor layouts.
Article
The perception of certain figures with illusory contours entails a reversal of figure and ground. It is hypothesized that this process occurs in two stages. First, some factor must suggest or cue the reversal. Experiments are described that isolate three such factors, namely, alignment of physically present contours, recognized incompletion of parts of the stimulus array, and set. Once cued, however, other experiments indicate that in a further stage of processing the solution is examined with respect to its compatibility with the stimulus display or with other preceptual properties to which the display gives rise. Only if such compatibility is present will the perception of a figure with illusory contours be maintained.
Article
IT is surprisingly easy to devise simple line figures which evoke marked illusory contours. Unlike the well known brightness contrast effects, these illusory contours can occur in regions far removed from regions of physical intensity difference; and they can be orientated at any angle to physically present contours. Fig. 1 is the figure described by Kanizsa1. An illusory triangle is observed whose apices are determined by the blank sectors of the three disks. The ``created'' object appears to have sharp contours which bound a large region of enhanced brightness.