Conference PaperPDF Available

Accessible videodescription On-Demand

Authors:

Abstract

Providing blind and visually impaired people with the descriptions of key visual elements can greatly improve the accessibility of video, film and television. This project presents a Website platform for rendering videodescription (VD) using an adapted player. Our goal is to test the usability of an accessible player that provides end-users with various levels of VD, on-demand. This paper summarizes the user evaluations covering 1) the usability of the player and its controls, and 2) the quality and quantity of the VD selected. The complete results of these evaluations, including the accessibility of the Website, will be presented in the poster. Final results show that 90% of the participants agreed on the relevancy of a multi-level VD player. All of them rated the player easy to use. Some improvements were also identified. We found that there is a great need to provide blind and visually impaired people with more flexible tool to access rich media content.
Accessible Videodescription On-Demand
Claude Chapdelaine and Langis Gagnon
R&D Department, Computer Research Institute of Montreal (CRIM)
550 Sherbrooke St.West, Suite 100
Montreal (Quebec), H3A1B9, CANADA
(514) 840-1234
{claude.chapdelaine; langis.gagnon}@crim.ca
ABSTRACT
Providing blind and visually impaired people with the
descriptions of key visual elements can greatly improve the
accessibility of video, film and television. This project presents a
Website platform for rendering videodescription (VD) using an
adapted player. Our goal is to test the usability of an accessible
player that provides end-users with various levels of VD, on-
demand. This paper summarizes the user evaluations covering 1)
the usability of the player and its controls, and 2) the quality and
quantity of the VD selected. The complete results of these
evaluations, including the accessibility of the Website, will be
presented in the poster. Final results show that 90% of the
participants agreed on the relevancy of a multi-level VD player.
All of them rated the player easy to use. Some improvements were
also identified. We found that there is a great need to provide
blind and visually impaired people with more flexible tool to
access rich media content.
Categories and Subject Descriptors
H.5.2 [User Interfaces]: Evaluation/methodology; K.4.2 [Social
Issues]: Assistive technologies for persons with disabilities.
Keywords
Rich media, Web accessibility, audio description, blind and visual
impairment.
1. INTRODUCTION
For the blind and visually impaired, the enjoyment of visual
media can be largely improved by the addition of narrative
descriptions corresponding to the relevant visual element.
Videodescription (VD), also known as audio description or
described video, is delivered through an audio channel that
enables the blind and visually impaired to form a more accurate
and vivid mental representation of what is shown on the screen.
However, essential questions such as: what are the key elements
to be convey, how can they be best described and how many can
there be in the gaps between existing speech segments, are not
fully answered when producing VD. Research on VD issues
[1][2][3] and guidelines on production practices [4][5] are
emerging. Yet, more research is needed in order for VD to be
known and used as much as captioning is for the deaf and hearing
impaired.
Copyright is held by the author/owner(s).
ASSETS’09, October 25–28, 2009, Pittsburgh, Pennsylvania, USA.
ACM 978-1-60558-558-1/09/10.
This paper presents the accessibility analysis of a Website
platform for rendering videodescription (VD) using an adapted
player, called VDPlayer. It is an initiative to promote and improve
the richness of the multimedia experience for people with vision
impairments.
2. ACCESSIBLE VIDEO
The challenges of producing accessible video for the blind and
visually impaired are numerous. Human issues relate to the
usability of what should be described and how much is needed.
While the accessibility issues are closely coupled with the
rendering medium; we will focus mainly on the Web environment.
Human Issues. The blind and the visually impaired do not
appraise their needs on VD at the same level. For example, some
mentioned that in real life situation when no VD is available, they
stop the DVD and ask questions to their sighted friends. For them,
the perfect rendering of VD would be to have the same freedom
without the constant need of a sighted friend. In recent user’s
consultation works [2], video with varying quantity and quality of
VD were shown to 30 participants. The resulting discussions
revealed that participants had preferences depending on his/her
level of blindness, personal taste and experience. A challenge
arises when a user desires more VD that the one delivered in the
available time. Actually, the Web Content Accessibility
Guidelines (WCAG) [6] proposes to offer extended VD when
foreground audio is insufficient. But, how to implement an
extended version of VD became one of the specifications of our
project [7].
Accessibility Issues. Implementing an accessible Website without
any rich media is a laborious task in itself, since each browser
implemented the W3C recommendations [6] with its own flavor.
It becomes a greater challenge when dealing with rich media
objects even with the available resources [8]. Furthermore,
problems are still too often part of the visually impaired users’
browsing experiences [9][10][11][12]. It is a known fact that their
interactions take much longer time than sighted users [13][14]; it
can take up to three minutes to get to the main content of page
with a screen reader. Solutions involved developing tool to
measure a Website’s conformance to the guidelines [15]. Many
tools are available, see [16] for a compare study.
3. USERS EVALUATION
We developed an accessible Website that presented five short
films that can be screened with our VDPlayer. We produced the
VD descriptions with the rendering of a synthetic voice and
provided two levels of VD: 1) the standard mode that gave the
VD fitting in the available non-speech segment and 2) the
extended mode that offered all VD produced which exceeded the
non-speech segments. In this later case, it required the player to
stop, deliver the VD and restart. The VDPlayer was designed to
1) offer standard video controls such as play, pause, rewind,
forward and volume change, and 2) provide controls specific to
the VD. Such as, to allow users to select the level of VD they
wanted, to personalize the item of VD that they could hear or to
repeat the last item of VD that was said.
After each viewing, participants could fill up an evaluation
questionnaire to provide feedback on the VD, their interaction
with the player and the Website. Ten participants completed the
questionnaire. The group was composed of seven men and three
women and most of them were over 31 years old. All the
participants navigated with a screen reader (9 with Jaws and 1
with Window Eye). All of them considered themselves an expert
user of their tool. Most participants declared being frequent
television viewers but infrequent users of VD mainly imputable to
its low availability.
3.1 VDPlayer Evaluation
The VDPlayer evaluation aimed at establishing the perceived
relevancy and ease of use of the video player controls related to
VD.
Relevancy. Selecting various VD levels was found relevant to
some degree (strongly and fairly) by 90% of the participants.
None of them found it to be irrelevant. Many participants
mentioned that they listened to more than one version and
appreciated more the extended version. This high score suggests
there is a need for this type of functionality.
Ease of use was evaluated for the player in general and for each of
the VD controls. In general, the player was judged fairly easy to
use by 90% of the participants. Individual controls were rated ease
to use by 90% to 100% of the participants. One participant who
rated the player fairly difficult to use commented that the sound
level was very low and that he was unable to augment it. After
verification, we found the related technical problem and corrected
it.
3.2 VD Evaluation
The evaluation of VD itself was done through a series of nine
statements either having a positive or negative tone. Participants
had to choose the level of agreement or disagreement for each of
them. The produced VD was greatly appreciated by the
participants. Indeed, 92% agreed (strongly or fairly agreed) with
the positive statements. While, only 15% agreed with the negative
statements. The weakest scores were related to 1) the quality of
the synthetic voice for which three participants judged it to be
moderately unacceptable, 2) most participants had the impression
the VD covered relevant audio information and one participant
found the VD frustrating at times. In conclusion, global results
indicate that VD is good and corresponds to a need but some
improvements could be implemented to better convey VD to the
listeners.
4. DISCUSSIONS
The goals of our project were reached since video with its
different levels of VD were made available and were screened by
blind or visually impaired users. Further, the user evaluation
proved that our approach was accessible and corresponded to their
needs. More recently, we have integrated a keyboard logging
algorithm into our VDplayer to better analyses how navigating in
the player is done by blind and visually impaired users in order to
better measure ease of use. In the future, more user testing is
planned to evaluate the robustness and usability of an interactive
accessible VDPlayer.
5. REFERENCES
[1] Piety, P.J. 2004. The language system of audio description:
an investigation as a discursive process. JVIB 98, no 8: 1-36.
[2] Turner, J.M. and Colinet, E. 2004. Using audio description
for indexing moving images. Knowledge organization 31, no
4: 222-230.
[3] Salway, A. 2007. A corpus-based analysis of audio
description. In Media for all. Edited by Cintas, J.D., Orero P.
and Remael A. Approaches to Translation Studies, New
York, NY. 151-174.
[4] Henry, S.L. 2006. Introduction to Web accessibility.
www.w3.org/WAI/intro/accessibility.php.
[5] Office of Communication. 2000. ITC Guidance On
Standards for Audio Description: www.ofcom.org.uk/
[6] Web Content Accessibility Guidelines (WCAG). 2008.
http://www.w3.org/TR/WCAG20/
[7] Gagnon, L. & als. 2009. Towards computer-vision software
tools to increase production and accessibility of video
description for people with vision loss. UAIS.(published
online 5 February). Springer Verlag.
[8] Flash and accessibility, http://www.usability.com.au
[9] Petrie, H. and Kheir, O. 2007. The relationship between
accessibility and usability of websites. In Proc. CHI, pages
397-406, San Jose, CA, ACM.
[10] Takagi, H., Saito, S., Fukuda, K. and Asakawa, C. 2007.
Analysis of navigability of web applications for improving
blind usability. TOCHI.
[11] Miyashita, H., Sato, D., Takagi, H. and Asakawa, C. 2007.
Making multimedia accessible for screen reader users. In
proceedings of W4A’07, ACM, pp. 126-127.
[12] Smillie, D. 2005. Instant Accessibility: does it work?, RNIB,
Web Access Center, http://www.rnib.org.uk/
[13] Bigham, J., Cavender, A.C., Brudvik, J.T., Wobbrock, J.O.
and Ladner, R.E. 2007. WebinSitu: A comparative Analysis
of Blind and Sighted Browsing Behavior, In ASSETS 2007.
[14] Takagi, H., Asakawa, C., Fukuda, K. and Maeda, J. 2004.
Accessibility designer: visualizing usability for the blind. In
ASSETS ’04.
[15] W3C/WAI. 2008. Conformance evaluation of web sites for
accessibilities. www.w3.org/WAI/eval/conformance.html.
[16] Brajnik, G. 2008. A comparative Test of Web Accessibility
Evaluation Methods, p 113. In ASSETS 2008.
... Audio is generated using IBM Watson's text-to-speech API. Synthesized video descriptions have been shown to be acceptable to blind and visually impaired users [15,46] and are used by professional video description production companies, such as 3play Media [2] and Automatic Sync Technologies [68]. ListenbyCode [49] to processes the audio track to return the script of the dialog, timestamp at each dialog, and the corresponding speakers. ...
... Many raters commented on the text-to-speech synthesis of the video descriptions across both conditions. Traditionally, in the literature, synthesized video descriptions have been shown to be acceptable to blind and visually impaired users [15,46]. However, the synthesized voice, while acceptable created mispronunciations of words such "bowl": "Using an actual human as the describer would be better only because this speech synthesizer mispronounces the word 'bowl,' which may confuse some people." ...
... Chapdelaine and Gagnon [6] perform an accessibility study on a website that provides videos with audio descriptions using an adapted video player called VDPlayer. An accessible site is developed to provide five short films, and the audio descriptions were provided in two presentation modes. ...
Article
Full-text available
Audio description (AD) is an assistive technology that allows visually impaired people to access cinema and understand the story of a movie. Basically, the visual content of the story is told by way of using a voice, narrated during the film gaps of silence. Nonetheless, this assistive technology is not widely used, due to several factors, among them the high cost and time involved in creating such audio descriptions. Towards solving this problem, this work proposes a solution that automatically generates AD scripts for recorded audiovisual content, named CineAD. This solution detects the breaks in the spoken lines in the video receiving the AD and generates these descriptions from the original script and subtitles. Alternatively, the solution can be incorporated into a speech synthesizer or used by an audio description narrator to generate the audio that contains the descriptions. To evaluate the proposed solution, qualitative tests with visually impaired users and audio description narrators are conducted. The results show that the proposed solution can generate descriptions of the most important events in the videos, and therefore, can help to reduce the barriers in accessing video faced by visually impaired, when the script and subtitles are available.
... Dedicated tools for video accessibility tailored to blind people have been proposed such as the aiBrowser [6]. The Canadian project E-inclusion [2, 4] is an ambitious initiative whose goal is to define automatic tools that analyze content in order to generate video metadata that can be used for accessible adaptable rendering. Even if this project goes further than ACAV on automatic processing, it does not focus on collaborative manual annotation nor multimodal rendering. ...
Conference Paper
Full-text available
The ACAV project aims to explore how the accessibility of web videos can be improved by providing rich descriptions of video content in order to personalize the rendering of the content according to user sensory deficiencies. We present a motivating scenario, the results of a preliminary study as well as the different technologies that will be developed.
Article
Automating the generation of audio descriptions (AD) for blind and visually impaired (BVI) people is a difficult task since it has several challenges involved, such as: identifying gaps in dialogues; describing the essential elements; summarizing and fitting the descriptions into the dialogue gaps; generating an AD narration track, and synchronizing it with the main soundtrack. In our previous work (Campos et al. [8]), we propose a solution for automatic AD script generation, named CineAD, which uses the movie’s script as a basis for the AD generation. This paper proposes extending this solution to complement the information extracted from the script and reduce its dependency based on the classification of visual information from the video. To assess the viability of the proposed solution, we implemented a proof of concept of the solution and evaluated it with eleven blind users. The results showed that the solution could generate a more succinct and objective AD but with a similar users’ level of understanding compared to our previous work. Thus, the solution can provide relevant information to blind users using less video time for descriptions.
Chapter
Readers with 20/20 vision can easily read text and quickly perceive information to get an overview of the information within a text. This is more challenging for readers who rely on screen readers. This study investigated factors affecting successful screen reading in order to shed light on what contributes towards the improvement of screen reading access. Text extraction, summarization, and representation techniques were explored. The goal of this work leads to the development of a new summarization technique, referred to as On Demand Summary Generation and Text Tagging (ODSG&TT). This technique makes use of a summarization algorithm and a text tagging algorithm developed by Algorithmia, which enables on the fly and on-demand summarization of text and keyword generation. The focus of the screen reader is transferred to the keywords using a button control. The intention is to provide summaries with minimum user navigation effort to simplify the screen reading process.
Conference Paper
Classical audiodescription process for improving video accessibility sometimes finds its limits. Depending on the video, required descriptions can be omitted because these may not fit in the durations of "gaps" in the video soundtrack (i.e. "void" spaces between dialogues or important sound elements). To address this issue, we present an exploratory work that focuses on the usage of "artificial" pauses in audio-described videos. Such pauses occur during the playing of the video so as to transmit more audio-descriptions. Our results show artificial pauses offer a good acceptability level as well as a low disturbing effect.
Conference Paper
This paper reviews the overall impact of culture on the design of crossover applications, which are particularly intended to support the blind and visually impaired (B/VI) community. We believe that cultural differences have an impact on the proliferation and wide usage of any assistive technology. Therefore, cultural aspects must be considered in the design of crossover applications for the B/VI community. Comprehensive cultural vision is necessary for an innovative application to be accepted widely. In addition, there is a need to revise and periodically review the application's acceptance, and update a product based on the cultural changes that may occur to some communities especially with the vast cultural exchange. This paper reviews and highlights good design practices for assistive technology from cultural perspectives.
Article
The increase in rich and interactive content on the Web, such as video and audio, has brought about a growth in the number and type of users that access it. This means that this content should be accessible to anyone, including people with disabilities. Therefore, it should be accompanied by media alternatives (captions, audio description, sign language, extended audio description, etc.), which should be handled by an accessible user agent that provides support for them. This paper presents: (1) an approach with accessibility requirements following standards to include accessibility in a user agent for delivering accessible multimedia content based on it, (2) a conceptualization of these requirements that provides an abstract user interface which may be integrated into the development process, after that, (3) a check of the integration of the accessibility requirements in the models, and (4) finally, some lessons learned and conclusions are shown.
Conference Paper
Most online videos are inaccessible for blind and visually impaired people. Those who are illiterate or barely literate also face difficulties with many videos. An audio description (AD) can help, but is rarely provided for actual online videos because of the time and trouble required. The technology platform we are developing uses a speech synthesis technology to add ADs to online videos at minimal cost. We are developing an authoring tool for AD scripts, a Web browser for playing the described videos, and a text-based format to handle the AD scripts.
Conference Paper
Full-text available
This paper presents work carried out under the umbrella of the EU-funded project BenToWeb to develop XHTML test case suites for three drafts of WCAG 2.0 (June 2005, April 2006, May 2007). These suites of test cases demonstrate pass and failure examples for WCAG 2.0 and its accompanying Techniques document. The test cases were validated during the BenToWeb project and are currently being migrated to the WAI Test Sample Development Task Force, where the work will be continued.
Article
Full-text available
This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.
Conference Paper
Full-text available
These days, accessibility-related regulations and guidelines have been accelerating the improvement of Web accessibility. One of the accelerating factors is the development and deployment of accessibility evaluation tools for authoring time and repair time. They mainly focus on creating compliant Web sites by analyzing the HTML syntax of pages, and report that pages are compliant when there are no syntactical errors. However, such compliant pages are often not truly usable by blind users. This is because current evaluation tools merely check if the HTML tags are appropriately used to be compliant with regulations and guidelines. It would be better if such tools paid more attention to real usability, especially on time-oriented usability factors, such as the speed to reach target content, the ease of understanding the page structure, and the navigability, in order to help Web designers to create not simply compliant pages but also usable pages for the blind. Therefore, we decided to develop Accessibility Designer (aDesigner), which has capabilities to visualize blind users' usability by using colors and gradations. The visualization function allows Web designers to grasp the weak points in their pages, and to recognize how accessible or inaccessible their pages are at a glance. In this paper, after reviewing the related work, we describe our approach to visualize blind users' usability followed by an overview of Accessibility Designer. We then report on our evaluations of real Web sites using Accessibility Designer. After discussing the results, we conclude the paper.
Conference Paper
Full-text available
Accessibility and usability are well established concepts for user interfaces and websites. Usability is precisely defined, but there are different approaches to accessibility. In addition, different possible relationships could exist between problems encountered by disabled and non-disabled users, yet little empirical data have been gathered on this question. Guidelines for accessibility and usability of websites provide ratings of the importance of problems for users, yet little empirical data have been gathered to validate these ratings. A study investigated the accessibility of two websites with 6 disabled (blind) and 6 non-disabled (sighted) people. Problems encountered by the two groups comprised two intersecting sets, with approximately 15% overlap. For one of the two websites, blind people rated problems significantly more severely than sighted people. There was high agreement between participants as to the severity of problems, and agreement between participants and researchers. However, there was no significant agreement between either participants or researchers and the importance/priority ratings provided by accessibility and usability guidelines. Practical and theoretical implications of these results are discussed.
Article
This paper includes some of the results of a study that looks at three types of text for automatically deriving shot-level indexing to moving images. Audio description is a voice added to the sound track of moving pictures to provide information for the visually impaired. We analyse two one-hour parts of a television production broadcast as a mini-series in 1997. We compare our results with those of a previous study, which identifies some of the characteristics of audio description and the associated moving image. We found close correspondence among some aspects studied and for other aspects much less correspondence, but for reasons we are able to explain. In addition, in the process of conducting the current study we further developed our methodology and now feel that it is a mature method for analysing audio description text as a source for generating indexing to the associated moving image.
Article
This paper presents the beginning of a corpus-based investigation into the language used for audio description. The automated analysis of audio description scripts for 91 films was successful in characterising some idiosyncratic features of what appears to be a special language. Our investigation also began to create an empirically-grounded overview and classification of the main kinds of information provided by audio description. The existence of a special language is explained in part by the fact that audio description is produced by trained professionals following established guidelines, and its idiosyncrasies are explained by considering its communicative function – in particular that it is being used to tell a story. Encouraged by the relatively high degree of regularity observed in the corpus, we go on to speculate about the application of language technologies for 'assisted audio description' and for repurposing audio description as a basis for indexing digital video archives.
Conference Paper
Web browsing is inecient for blind web users because of persistent accessibility problems, but the extent of these problems and their practical eects from the perspective of the user has not been suciently examined. We conducted a study in situ to investigate the accessibility of the web as ex- perienced by web users. This remote study used an advanced web proxy that leverages AJAX technology to record both the pages viewed and the actions taken by users on the web pages that they visited. Our study was conducted remotely over the period of one week, and our participants used the assistive technology and software to which they were already accustomed and had already configured according to prefer- ence. These advantages allowed us to aggregate observations of many users and to explore the practical eects on and coping strategies employed by our blind participants. Our study reflects web accessibility from the perspective of web users and describes quantitative dierences in the browsing behavior of blind and sighted web users.
Conference Paper
Rich and multimedia content is increasing rapidly on the Web. It is very attractive for sighted people, but it brings severe problems to screen reader users. Once the audio starts playing, it becomes hard for blind users to listen to the screen reader because there is physically only one volume control that cannot control the separate audio streams. Though there are often software- controlled buttons to control the audio, they are often controllable only with a mouse and are not associated with alternative text. Because of the audio conflicts and inaccessible control buttons, the multimedia content is often inaccessible to blind users. In addition, the use of dynamically changing interactive user interfaces is also a critical issue, since existing screen readers cannot detect such dynamic content changes. We developed an accessible Internet browser for multimedia to address these problems and offer multimedia content as an information resource for the blind. It is characterized by three major features. First, it allows users to control the audio, such as the volume, play/stop, pause, and even the speed. Second, a dynamically adaptable metadata function is added to simplify complicated multimedia pages and to track dynamic changes and effectively inform users about the changes. Third, an audio description function supports Internet movies with a text format described by the metadata. In this paper, after briefly discussing the existing accessibility problems of multimedia content, we describe our accessible Internet browser for multimedia.
Article
Various accessibility activities are improving blind access to the increasingly indispensable WWW. These approaches use various metrics to measure the Web's accessibility. “Ease of navigation” (navigability) is one of the crucial factors for blind usability, especially for complicated webpages used in portals and online shopping sites. However, it is difficult for automatic checking tools to evaluate the navigation capabilities even for a single webpage. Navigability issues for complete Web applications are still far beyond their capabilities. This study aims at obtaining quantitative results about the current accessibility status of real world Web applications, and analyzes real users' behavior on such websites. In Study 1, an automatic analysis method for webpage navigability is introduced, and then a broad survey using this method for 30 international online shopping sites is described. The next study (Study 2) focuses on a fine-grained analysis of real users' behavior on some of these online shopping sites. We modified a voice browser to record each user's actions and the information presented to that user. We conducted user testing on existing sites with this tool. We also developed an analysis and visualization method for the recorded information. The results showed us that users strongly depend on scanning navigation instead of logical navigation. A landmark-oriented navigation model was proposed based on the results. Finally, we discuss future possibilities for improving navigability, including proposals for voice browsers.