ArticlePDF Available

WebSound: a generic Web sonification tool, and its application to an auditory Web browser for blind and visually impaired users

Authors:

Abstract and Figures

The inherent visual nature of Internet browsers makes the Web inaccessible to the visually impaired. Although several nonvisual browsers have been developed, they usually transform the visual content of HTML documents into textual information only, that can be rendered by a text-to-speech converter or a Braille device. The loss of spatial layout and of textual attributes should be avoided since they often bear visually important information. Moreover, typical non-visual Internet browsers do not allow visually impaired and sighted individuals to easily work together using the same environment. This paper presents WebSound, a new generic Web sonification tool and its application to a 3D audio augmented Internet browser (Internet Explorer 5.0) developed by the Computer Vision Group at the University of Geneva.
Content may be subject to copyright.
WebSound: a generic Web sonification tool, and its application to
an auditory Web browser for blind and visually impaired users
Lori Stefano Petrucci1, Eric Harth1, Patrick Roth1, André Assimacopoulos2, Thierry Pun1
(1) Computer Science Dept. (2) UCBA
CUI, University of Geneva Swiss Central Union of and for the Blind
CH – 1211 Geneva 4, Switzerland Schützengasse 4, CH- 9000 St.-Gallen,
+41 22 705 76 32 Switzerland
Lori.Petrucci@cui.unige.ch
ABSTRACT
The inherent visual nature of Internet browsers makes the Web inaccessible to the visually impaired. Although several non-
visual browsers have been developed, they usually transform the visual content of HTML documents into textual information
only, that can be rendered by a text-to-speech converter or a Braille device. The loss of spatial layout and of textual attributes
should be avoided since they often bear visually important information. Moreover, typical non-visual Internet browsers do
not allow visually impaired and sighted individuals to easily work together using the same environment. This paper presents
WebSound, a new generic Web sonification tool and its application to a 3D audio augmented Internet browser (Internet
Explorer 5.0) developed by the Computer Vision Group at the University of Geneva.
Keywords
Audio-haptic interface, non-sound speech, 3D virtual audio environment, earcons, blindness and visual impairment.
INTRODUCTION
In the past, computer interfaces mostly consisted of simple raw-text displays of information. Providing simple access
solutions to blind and visually impaired computer users was then a straightforward task, since the use of text-to-speech output
devices were sufficient to restitute the displayed information.
Nevertheless, the generalization of Graphical User Interfaces (GUIs) and of the World Wide Web in particular which heavily
rely on the visual presentation of information, leads to new accessibility problems for users with visual deficiencies. In fact,
more and more digital information becomes available daily on the net and can be accessed by everyone and everywhere.
Hypertext documents – HTML in particular and probably XML in the future – have become the most used support to present
such information. However, the generalization of visual documents has modified the way information should be restituted to
blind users. In addition to textual information, it is also important to take into account textual attributes such as boldface,
italic, underline, color, size, as well as the spatial layout itself relating the elements that appear on the screen. Those new
problems have to be solved with new alternative non-visual display techniques in order to permit blind users to take
advantage of new technologies.
A preliminary study done at the University of Geneva has shown that the use of non-speech sound in graphical interfaces,
particularly for an Internet browser, can considerably increase the “bandwidth” of computer output [1,2]. This is also shown
in the work of Stephen A. Brewster [3,4] which provides guidelines for incorporating earcons into the design of human-
computer interfaces, as well as in the studies of Frankie James [5]. In a first project, named AB-Web, we have developed a
prototype of an augmented audio Web browser which transforms the two dimensional visual representation of an HTML
document into a 3D immersive virtual audio navigable environment (soundscape) [6].
The need for a generic Web sonification tool which allows to easily create, test and validate new sonification models has
become evident. For this reason, we have developed such a tool, WebSound, which combines the haptic sense with an audio
output. Our approach tries to validate the hypothesis that a 3D immersive virtual sound navigable environment combined
with haptic manipulation of the audio environment can enable blind user to construct a mental representation of the spatial
document.
USING AN AUDIO-HAPTIC INTERFACE TO CONVEY VISUAL AND SPATIAL INFORMATION
The visual channel has a tremendous capacity for information transfer. As a result, it is heavily relied upon in normal human-
computer interfaces. Internet browsers (for example Internet Explorer or Netscape Navigator) do not escape from this
consideration. Most of the information that is presented, however, is not visual in nature, and could easily be presented in an
The Sixth International Conference on Auditory Display (ICAD 2000)
alternate verbal form. This consideration is certainly the basis for non-visual Internet browsers (for example IBM Home Page
Reader, Lynx, pwWebSpeak, etc.) that typically transform the two-dimensional visual representation of HTML documents
into a one-dimensional representation, that is simple raw text which can be presented by a text-to-speech converter.
The need expressed by visually impaired people to use standard visual Web browsers such as Internet Explorer, has led us to
develop new alternative display techniques to convey the spatially and visually related information that is present in an
HTML document. Since the new challenge is to present the spatial relationships between HTML elements (tags) and their
visual attributes, it immediately suggests the use of techniques and senses which are commonly used by blind and visually
impaired people to find locations and to perceive their environment. These include both auditory and tactile approaches. A
possible technique is to provide different auditory attributes (earcons / auditory icons) in association which each different tag.
Moreover, the use of a 3D immersive audio environment, which permits to make a sound appear from a given position, may
give blind users the sense of object location.
To enhance the mental representation of the virtual sonic environment, we propose to add to the hearing sense the haptic
modality, that is the sense of where one’s hand or arm is in space. Using a haptic tablet or a touch-sensitive screen would
allow an individual to move his/her finger about while at the same time keeping track of his/her position on the device. The
system would then respond with auditory feedback while the user moves his/her finger (device pointer) around the virtual
screen. This approach, as shown by M. Lumbreras et al [7] and in our work [2] validates the hypothesis that a 3D immersive
virtual sound environment combined with haptic manipulation of the audio environment can enable blind users to construct a
mental representation of the spatial environment (in our case, the spatial layout of HTML documents).
WEBSOUND: ARCHITECTURE, USER INTERFACE, AND IMPLEMENTATION
One of the goals of WebSound is to provide researchers of HCI working on non-visual Web interfaces, with a tool that allows
them to easily add new accessible functionalities to a standard visual Web browser. This doing, blind and sighted computer
users would use the same environment to browse the WWW. To reach this goal, the design and implementation of a generic
Web sonification tool (WebSound) has been done on the basis of two fundamental ideas:
use of the same augmented visual browser based on the standard Internet Explorer 5.0, by “Internet surfers” (visually
impaired and sighted users) as well as by programmers of new behavioral models;
creation of an add-on tool, the Workspace, allowing to dynamically and visually create new access modalities.
Architecture
The WebSound application has been developed for two categories of users: the programmers of behavioral models and the
“Internet surfers”. This consideration has lead us to develop not only a new augmented visual browser for blind and sighted
users but also a generic Web sonification tool that will be useable by HCI researchers.
Figure 1. WebSound global architecture.
The WebSound components shown in figure 1 have been developed in order to furnish a set of functionalities that will permit
programmers to easily add new behavioral model onto the application. This set of functionalities primarily consists of:
the Internet Explorer ActiveX component that will be responsible for the Internet browsing;
a component that will be responsible for representing the internal hierarchical structure of the HTML document;
a component, called the Workspace, allowing the definition and creation of new behavioral models;
a component that will be responsible for the restitution of the audio output in a 3D audio environment;
a component that will be responsible for the textual output using a text-to-speech converter.
User interface
The user interface (Fig. 2a,b) has been divided into 6 different sub-windows, each displaying a different type of information.
Figure 2a. WebSound interface displaying Internet Explorer. Figure 2b. WebSound interface displaying the Workspace.
The views (1,2,3) are common to the two categories of users mentioned earlier, since they are responsible for the Internet
browsing. If the user is an Internet surfer, the windows (4,5,6) will be hidden. In this case, the WebSound application will
look like the standard Internet Explorer, except that it will propose new access modalities. Nevertheless, the views (4,5,6) are
very useful for the developer of new behavioral modalities since they display technical information such as the hierarchical
structure of the HTML document (Fig 2a, view 4) or the internal content of each HTML element (Fig 2a, view 5).
Implementation
The implementation of a generic Web sonification tool has compelled us to address some difficult issues. The first problem to
solve has been to find a way to obtain all the internal events that occur in the Internet Explorer ActiveX component, in order
for example to determine the device pointer’s position while moving on the browser. The second problem has been to
determine which HTML tags the device pointer points on. Finally, we needed to create a component, the Workspace (Fig 2b)
that was to be responsible for managing the creation of new behavioral models.
Figure 3. A HTML document and its hierarchical representation
The first problem has been solved using some possibilities that Microsoft offers to get informed on the internal events that
occurs in a Windows based software or in an ActiveX application such as Internet Explorer. The technical details [8] of this
solution are omitted here. To solve the second problem, we need to access the hierarchical structure of the displayed HTML
document (Fig. 3, Fig. 2.a, view 3). This can be done using the Document Object Model (DOM) of an HTML document.
Finally, an off-screen structure that can receive and send messages, called HtmlNode, is created for each HTML element.
In order to create and test new behavioral models, all the events that are produced during the Internet browsing need to be
detected and processed by the WebSound application. For example, if the user moves his/her finger across an image present
in the HTML document, the HtmlNode associated with the image tag will send messages of type MouseIn, MouseMove
(repeated while the device is inside the image border) and MouseOut to the WebSound application. In the same way, if the
user moves around a header, the HtmlNode associated to this element will be responsible to send the corresponding messages.
Figure 4. An example of a dynamic sonic model.
Those messages have no behavior until they are connected, via the Workspace, to entities called Services. A Service is a piece
of code that performs a particular task depending on the message type it receives. For example, if the programmer wants to
play a sound when the device pointer moves over an image, he needs to create a Service (MicroSound in the figure 4) which
can take a sound and play it when he receives a MouseMove message from the HtmlNode associated with the image.
This mechanism can be extended to any accessibility modality such as reading the HTML document using keyboard,
navigating inside HTML tables, filling HTML forms, etc.
CONCLUSION
In this article, we present a new application, WebSound, that allow sighted and visually impaired “Internet surfers” to
navigate on the same visual browser. It also describes a new concept that allows HCI researchers to dynamically create
alternative access modalities on top of a standard visual Web browser (Internet Explorer 5.0). Finally, it also proposes a
possible way to permit visually impaired individuals to explore spatial information by the way of audio-haptic interface.
FUTURE WORK
We are currently working on a speech recognition module that will allow blind as well motor impaired users to exploit our
WebSound browser using the voice as input device. Moreover, we are also integrating a new haptic device (WingMan Force
Feedback mouse from Logitech) to enhance the perceptual feedback loop. This new sense will allow blind users not only to
listen to the content of each HTML element but also to feel those elements. For example, the blind user will be able to sense
the contour of an image; it will also be able to feel the associated texture of an element instead of the associated sound.
ACKNOWLEDGMENTS
This project is financed by the Swiss Priority Program in Information and Communication Structure and by the Swiss Central
Union Of and For the Blind. The authors are grateful to A. Barrillier, V. Hintermeister and V. Rossier for their help in the
design and evaluation of the prototype.
REFERENCES
1. Petrucci L., Roth P., Assimacopoulos A., Pun T. An Audio Browser for Increasing Access to World Wide Web Sites for
Blind and Visually Impaired Computer Users. HCI’99, Munich, pp. 995-998.
2. Roth P., Petrucci L., Assimacopoulos A., Pun T. AB-Web: Active audio browser for visually impaired and blind users. In
International Conference on Auditory Display (ICAD'98). Glasgow, UK, November 1998.
3. Brewster S.A. Using Non-Speech Sounds to provide Navigation Cues. ACM transaction on Computer-Human Interaction
5(3), pp. 224-259.
4. Brewster S.A. Providing a structured method for integrating Non-Speech Audio into HCI. Doctoral Dissertation, University
of York, 1994.
5. James F. Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext. ICAD’96 Proceedings. Xerox
PARC, 4-6 Nov. 1996, pp. 97-103.
6. Blauert J. Spatial Hearing. The MIT Press, Cambridge, Massachusetts, 1997.
7. Lumbreras M., J. Sanchez. Interactive 3D Sound Hyperstories for Blind Children. CHI’99 Pittsburgh PA, USA.
8. Harth E., Plate-forme de Sonification Dynamique de Pages Web Pour Aveugles et Malvoyants. Diploma thesis, University
of Geneva, 1999.
... Goose and Möller [14] mapped the structure of HTML documents into a 3D audio space and explored this space through audio signals. The approach Websound [40] transformed structural components of web pages into spatial auditory representations that could be explored by an audio-haptic interface and combined it with text-to-speech output for textual components. A subsequent multimodal approach introduced the sonification of web pages for VIP [41], including text and images. ...
Article
3D models are an important means for understanding spatial contexts. Today these models can be materialized by 3D printing, which is increasingly used at schools for people with visual impairments. In contrast to sighted people, people with visual impairments have so far, however, neither been able to search nor to print 3D models without assistance. This article describes our work to develop an aid for people with visual impairments that would facilitate autonomous searching for and printing of 3D models. In our initial study, we determined the requirements to accomplish this task by means of a questionnaire and developed a first approach that allowed personal computer-based 3D printing. An extended approach allowed searching and printing using common smartphones. In our architecture, technical details of 3D printers are abstracted by a separate component that can be accessed via Wi-Fi independently of the actual 3D printer used. It comprises a search of the models in an annotated database and 3D model retrieval from the internet. The whole process can be controlled by voice interaction. The feasibility of autonomous 3D printing for people with visual impairments is shown with a first user study. Our second user study examines the usability of the user interface when searching for 3D models on the internet and preparing them for the materialization. The participants were able to define important printing settings, whereas other printing parameters could be determined algorithmically.
... Les images intelligentes associées à un logiciel adapté permettraient de supprimer totalement cette besogne fastidieuse, en insérant directement les différents liens dans l'image, il en résulterait un gain conséquent en précision et une facilité d'utilisation accrue. Une autre possibilité consisterait à ne pas incorporer des données de type lien hypertexte dans l'image, mais d'intégrer des informations sonores, cela pourrait avoir des débouchés, en l'associant aux travaux déjà réalisés dans ce domaine [1], sur la mise en place du Web pour les aveugles en permettant l'accès à une image par l'ouie et non par la vue. information analogique. ...
... Audio Enriched Links software has been developed to provide previews of linked web pages to users with visual impairments [7]. WebSound [8, 9] is a sonification tool where sonic objects are associated with HTML tags, which are then projected into a virtual 3D sound space according to finger position on a tactile touch screen. Mapping Internet navigational and structural information to sound is a challenging task in that a sound designer cannot have control over the structural design of individual web sites. ...
Article
Full-text available
This paper details the design of an audio interface for a multi-modal content-aware web plug-in. The system aims to provide spatial and navigational information to visually impaired Internet users through speech and non-speech audio with haptic feedback. The web plug-in and audio interface are presented and discussed, along with recommendations for future system development.
... Beaucoup de ces interfaces pourraient être améliorées si elles incluaient des principes d'interaction sonores [1]. Le Web pourrait aussi devenir plus accessible en devenant sonore [2], [3]. Dans cette optique, les jeux sonores sont une source d'inspiration de premier choix. ...
Article
Full-text available
Résumé -De nombreuses interfaces sont inspirées de jeux vi-déo, mais restent hors de portée des personnes handicapées visuel. Pour en améliorer l'accessibilité, il est intéressant de puiser dans l'ensemble des principes d'interaction des jeux so-nores. Selon une classification des jeux vidéo, les jeux sonores peuvent être répartis en quatre groupes principaux : puzzle, action, stratégie et exploration. Chacun d'entre eux utilise des catégories de sons de façon différente. Contrairement aux jeux vidéo, l'absence d'image des jeux sonores implique de grandes difficultés dans la compréhension du maniement. Trois façons d'aborder un jeu sonore peuvent être envisagées selon l'utilisation d'un langage ou non. Les joueurs se tournant aussi vers d'autres loisirs musicaux interactifs permettent une vision d'avenir de l'évolution des jeux sonores.
... An important application area is sonification for blind and visually impaired users. In this context Petrucci et al [11] showed how to use sonification in auditory web browsers to allow visually impaired users to explore spatial information by means of an audio-haptic interface. ...
Article
Full-text available
Writing text messages (e.g. email, SMS, instant messaging) is a popular form of synchronous and asynchronous communica-tion. However, when it comes to notifying users about new messages, current audio-based approaches, such as notification tones, are very limited in conveying information. In this paper we show how entire text messages can be encoded into a mean-ingful and euphonic melody in such a way that users can guess a message's intention without actually seeing the content. First, as a proof of concept, we report on the findings of an initial on-line survey among 37 musicians and 32 non-musicians evaluat-ing the feasibility and validity of our approach. We show that our representation is understandable and that there are no sig-nificant differences between musicians and non-musicians. Second, we evaluated the approach in a real world scenario based on a Skype plug-in. In a field study with 14 participants we showed that sonified text messages strongly impact on the users' message checking behavior by significantly reducing the time between receiving and reading an incoming message.
Conference Paper
Audio Enriched Links provide previews of linked web pages to users with visual impairments. Before a user follows a hyperlink, the Audio Enriched Links software presents a spoken summary of the next page including its title, its relation to the current page, statistics about its content, and some highlights from its content. We believe that such a summary may be a useful surrogate for a full web page, and help users with visual impairments decide whether or not to spend time visiting a linked page. In this paper, we present some motivation for the Audio Enriched Links project. We describe the design and implementation of the current software prototype, and discuss the results of an initial evaluation involving four participants. We conclude with some implications of this work and directions for future research.
Conference Paper
Sonification is relatively a new field of research as compared to visualization. This field still requires more research to be conducted to produce more results that finally can be used as design guidelines. This paper discusses design issues and evaluation that need to be taken into consideration in designing sonification applications. The issues and evaluation are based on papers and journals of previous research in auditory display.
Article
Full-text available
This paper reflects on how Web cognition is experienced by blind users employing screen-readers for Web interaction. Many of the differences in Web interaction between sighted users and users of screen-readers arise from the serial way in which Web pages are rendered by screen-readers. We begin by examining the ways in which these differences are brought about through the functionality of current screen-readers. The mismatch between the spatial layout of Web pages and the temporal nature of speech imply a substantially increased cognitive load for Web interactions. The paper reports findings from a survey and a task-based study which provides some practical examples of the way these issues appear in real contexts of use. In particular, the wide differences between the initial impressions and mental models of Web pages gained by sighted and visually impaired users, and the influence that these and other interaction characteristics have on collaborative tasks. We propose a draft taxonomy of errors for cross-modal Web interaction, and examine how non-speech sound might be employed to address the different categories of error in the draft taxonomy.
Conference Paper
Full-text available
Mobile phones offer great potential for personalization. Besides apps and background images, ringtones are the major form of personalization. They are most often used to have a personal sound for incoming texts and calls. Furthermore, ringtones are used to identify the caller or sender of a message. In parts, this function is utilitarian (e.g., caller identification without looking at the phone) but it is also a form of self-expression (e.g., favorite tune as standard ringtone). We investigate how audio can be used to convey richer information. In this demo we show how sonifications of SMS can be used to encode information about the sender's identity as well as the content and intention of a message based on flexible, user-generated mappings. We present a platform that allows arbitrary mappings to be managed and apps to be connected in order to create a sonification of any message. Using a background app on Android, we show the utility of the approach for mobile devices.
Conference Paper
Full-text available
Interactive software is currently used for learning andentertainment purposes. This type of software is not very commonamong blind children because most computer games and electronictoys do not have appropriate interfaces to be accessible withoutvisual cues.This study introduces the idea of interactive hyperstoriescarried out in a 3D acoustic virtual world for blind children. Wehave conceptualized a model to design hyperstories. ThroughAudioDoom we have an application that enables testing cognitivetasks with blind children. The main research question underlyingthis work explores how audio- based entertainment and spatial soundnavigable experiences can create cognitive spatial structures inthe minds of blind children.AudioDoom presents first person experiences through explorationof interactive virtual worlds by using only 3D auralrepresentations of the space.
Article
Full-text available
This article describes 3 experiments that investigate the possibiity of using structured nonspeech audio messages called earcons to provide navigational cues in a menu hierarchy. A hierarchy of 27 nodes and 4 levels was created with an earcon for each node. Rules were defined for the creation of hierarchical earcons at each node. Participants had to identify their location in the hierarchy by listening to an earcon. Results of the first experiment showed that participants could identify their location with 81.5% accuracy, indicating that earcons were a powerful method of communicating hierarchy information. One proposed use for such navigation cues is in telephone-based interfaces (TBIs) where navigation is a problem. The first experiment did not address the particular problems of earcons in TBIs such as “does the lower quality of sound over the telephone lower recall rates,” “can users remember earcons over a period of time.” and “what effect does training type have on recall?” An experiment was conducted and results showed that sound quality did lower the recall of earcons. However; redesign of the earcons overcame this problem with 73% recalled correctly. Participants could still recall earcons at this level after a week had passed. Training type also affected recall. With personal training participants recalled 73% of the earcons, but with purely textual training results were significantly lower. These results show that earcons can provide good navigation cues for TBIs. The final experiment used compound, rather than hierarchical earcons to represent the hierarchy from the first experiment. Results showed that with sounds constructed in this way participants could recall 97% of the earcons. These experiments have developed our general understanding of earcons. A hierarchy three times larger than any previously created was tested, and this was also the first test of the recall of earcons over time.
Article
Full-text available
This paper includes our main findings; for a more detailed discussion, refer to [7].
Article
The Internet now permits easy access to textual and pictorial material from an exponentially growing number