Conference PaperPDF Available

Spoken Mathematics Using Prosody, Earcons and Spearcons



Printed notation provides a highly succinct and unambiguous description of the structure of mathematical formulae in a manner which is difficult to replicate for the visually impaired. A number of different approaches to the verbal presentation of mathematical material have been explored, however, the fundamental differences between the two modalities of vision and audition are often ignored. This use of additional lexical cues, spatial audio or complex hierarchies of non-speech sounds to represent the structure and scope of equations may be cognitively demanding to process, and this can detract from the perception of the mathematical content. In this paper, a new methodology is proposed which uses the prosodic component found in spoken language, in conjunction with a limited set of spatialized earcons and spearcons, to disambiguate the structure of mathematical formulae. This system can potentially represent this information in an intuitive and unambiguous manner which takes advantage of the specific strengths and capabilities of audition.
Spoken Mathematics using Prosody, Earcons and
Enda Bates1 and Donal Fitzpatrick1,
1 Dublin City University, Dublin 9, Ireland
{Enda Bates, Donal Fitzpatrick}
Abstract. Printed notation provides a highly succinct and unambiguous
description of the structure of mathematical formulae in a manner which is
difficult to replicate for the visually impaired. A number of different approaches
to the verbal presentation of mathematical material have been explored,
however, the fundamental differences between the two modalities of vision and
audition are often ignored. This use of additional lexical cues, spatial audio or
complex hierarchies of non-speech sounds to represent the structure and scope
of equations may be cognitively demanding to process, and this can detract
from the perception of the mathematical content. In this paper, a new
methodology is proposed which uses the prosodic component found in spoken
language, in conjunction with a limited set of spatialized earcons and spearcons,
to disambiguate the structure of mathematical formulae. This system can
potentially represent this information in an intuitive and unambiguous manner
which takes advantage of the specific strengths and capabilities of audition.
Keywords: Math, auditory interfaces, visual impairment, earcons, spearcons.
1 Introduction
Printed mathematical equations provide a significant amount of information in a
highly succinct manner. The visual representation immediately and unambiguously
indicates structural information such as the presence and scope of a fraction or the
terms contained within a square root operator. Developing a similarly efficient and
unambiguous representation for the visually impaired is a significant challenge.
Tactile representations such as Braille represent one entirely valid approach to this
issue, however, this paper will concentrate on the primary alternative to Braille,
speech synthesis.
2 Visual and Audible Representations of Mathematical Material
Printed notation provides a persistent visual cue which is effectively non-
temporal, however, audible representations are necessarily fleeting due to the
inherently temporal nature of sound. Sighted users can therefore utilize printed
material as a form of external memory and do not need to memorize the structure and
layout of an equation. This conclusion is supported by the results of a series of
cognitive experiments examining equation reading in sighted users, conducted by
Gillan et al [1], which found that sighted subjects process operators and numbers
more intensively than parentheses. This is perhaps unsurprising as the spatial structure
of the equation (which is implied through the use of parenthesis and other graphical
symbols and delimiters) is unambiguous and persistent when presented visually. This
suggests that working with mathematical material in a non-visual medium will result
in an inevitable increase in cognitive load as this structural information must now be
held in memory. This strongly implies that any method of presenting the spatial
structure of an equation via audition must be as easy as possible to cognitively
2.1 Active Browsing of Mathematical Material
Printed mathematics provides a strong spatial definition of the mathematical
content and the user provides the temporal dimension by actively directing their
attention back and forth through the material [1]. A simple audio representation works
in reverse, as the serial stream of audio is highly temporal and the user must now infer
and remember the spatial structure from the presented material. In effect, a printed
symbol exists outside of time, and a sighted user can concentrate on this symbol
whenever they wish. Audition on the other hand is highly temporal and omni-
directional, and cannot be consciously directed in the same way as vision. Clearly this
functionality is critical if a visually impaired user is to have effective control over the
temporal representation of the mathematical material, and indeed the majority of the
available software for the presentation of spoken mathematics incorporate some form
of active browsing which allows visually impaired users to actively browse and
segment mathematical equations [1,2,3,4].
2.2 Spatial Representations
Various projects have attempted to directly map the visible spatial structure of
printed notation to an equivalent audible structure. It has been suggested that this will
reduce the mental effort required by the user to process and solve the equation [5],
however, this may not in fact be the case. The results of a number of experiments
have found that the minimum detectable changes in position for visual and auditory
stimuli are approximately 1’ for visual targets and 2o-4o for auditory targets [6]. The
spatial resolution of vision is clearly therefore much more accurate than audition and
this suggests that accurately replicating the spatial layout of a printed equation with
spatial audio will be difficult to achieve.
While dynamic sonic trajectories can be used to create sonic shapes, this appears to
be a cognitive process, rather than a perceptual one [7]. In effect, the shape is
cognitively inferred from our perception of the moving sound and is not perceived
directly. Therefore it cannot be assumed that mirroring the spatial layout of a written
equation with spatial audio will reduce the cognitive load on the listener. In fact, it has
been found that the additional mental processing required to determine the spatial
trajectory detracts from the processing of the mathematical content [8]. The vertical
layout of mathematical equations would appear to be well matched to a spatial audio
presentation of an equation, however, auditory localization is particularly inaccurate
in the vertical dimension and extremely difficult to synthesize, particularly with
binaural spatialization techniques based on the head-related-transfer function, or
HRTF [9, 10]. Poor results have also been found when HRTF techniques are replaced
with real loudspeakers positioned vertically in front of the listener, such [10].
These issues suggest that a linear spatial mapping from vision to audition is not
particularly useful due to the fundamental differences between the two senses,
however, spatial audio may be beneficial in other ways. Numerous studies have
shown that sounds produced from different spatial locations are easier to distinguish,
which suggests that if additional sounds are added to the main speech signal, these
should be produced from different spatial locations as in the system developed by
Goose et al [8]. The externalization effect which generally occurs when binaural
material is presented via headphones has also been found to be much less fatiguing
than standard stereo, and this may also be of some benefit [9].
3 Existing Approaches to Spoken Mathematics
3.1 Lexical Cues
The use of additional spoken cues to indicate the structure of equations and
formulas is reminiscent of the linear representation of mathematical material in
computer programming languages (which also lack the two dimensional spatial
component of traditional printed mathematics). Nemeth Mathspeak is one system
which adopts this approach and concentrates on removing the ambiguities which arise
in spoken mathematics [11]. Mathspeak uses additional lexical cues such as “begin
fraction” and “end-fraction” (abbreviated to “B-frac” and “E-frac”) to clearly mark
the beginning and end of the fraction. Although this solution entirely removes the
ambiguity from spoken mathematics, it becomes confusing as the equations become
more complex. For example, the formula for quadratic equations would be described
as, “B-frac minus B plus or minus B-rad B sup 2 base minus 4 AC E-rad over 2 A E-
frac”. In addition, there is evidence to suggest that if the end of an utterance is marked
with a verbal message, than this message may detract from the listener’s memory of
the utterance, a process which is referred to as the suffix effect [12]. Although
Nemeth’s approach largely eliminates structural ambiguity in the presentation, the
cognitive effort required to remember and parse the equation is quite significant for
all but the simplest material.
3.2 Earcons, Auditory Icons and Spearcons
In contrast to a purely lexical approach, a number of systems have been developed
which use additional, non-speech sounds to indicate various aspects of the
mathematical material, such as the structure of an equation. Non-speech sounds have
long been used to improve access to graphical user interfaces and are often classified
as earcons, auditory icons or spearcons. Earcons are generally constructed from
abstract sounds such as the error beep in the Windows OS and may consist of melodic
or rhythmic musical material. Although the exact meaning of earcons must be learned
by the user, the abstract nature of the sound means that earcons are relatively easy to
synthesize and modify. In contrast, auditory icons use sampled sounds which are
intended to resemble or refer to the object they represent. This means that auditory
icons are more difficult to manipulate than earcons but their meaning may be easier
for the listener to recognize. Spearcons are a more recent development and are created
by time compressing a spoken phrase so that the resulting short sound is not entirely
comprehensible as speech [13].
Using auditory icons to represent structural mathematical constructs like
parenthesis or fractions is inherently difficult as these is no obvious relationship
between this abstract mathematical syntax and real-world sounds. As it is relatively
straightforward to create a hierarchical range of earcons, these may be more
appropriate to represent nested parenthesis or other structural notation. This approach
was adopted by the MathTalk system which uses musical earcons to indicate
structural delimiters (such as a pattern of 3 short ascending and descending notes to
indicate the beginning and end of an expression respectively) and also provide an
abstract overview of the entire equation [14]. The significant drawback to this
approach is that a complex and highly specific musical grammar must be learned prior
to using the system. In addition, remembering and decoding musical patterns may be
quite difficult for non-musicians and again, the additional cognitive effort required to
decode each pattern could detract from the processing of the mathematical content.
Spearcons lie somewhere between recognizable speech and abstract sounds such as
earcons. A number of studies have found that spearcons were easier to learn and
resulted in an increase in performance in interactive interface tasks such as menu
browsing [13]. The major advantage of spearcons is that they can potentially function
as either a descriptive lexical phrase or an abstract sound depending on its familiarity
to the listener or its function as a structural delimiter.
2.4 Prosody
Prosody, i.e. the patterns of stress and intonation in a language, can also be used to
indicate the structure of an equation through the manipulation of parameters such as
pause duration, speaking rate and pitch. Raman’s AsTeR system used prosody in
conjunction with additional lexical cues such as “fraction” and “end-fraction”, and
quantity” and “end-quantity” as parenthesis to indicate sub-expressions [2]. Different
levels of exponents and nested expressions were indicated using persistent speech
cues, such as a speaking voice with raised pitch to indicate an exponent. Other
systems have attempted to use prosodic schemes more reminiscent of normal speech.
Stevens analyzed recordings of spoken algebraic equations and used the pitch, timing
and amplitude characteristics of these recordings as the basis for an audio interface
designed specifically for the presentation of mathematics [4]. Fitzpatrick argues that
the prosodic model used by Stevens conflicts with current research in prosodic and
intonational phonology, and instead proposed an alternative model which relates the
structure found in mathematical expressions with the inherent composition and
structure of the English language [15]. In this model, the nesting of clauses within
sentence structures is equated to the nesting of sub-expressions within a larger
equation whose structure is similarly indicated through the insertion of pauses and
alterations to the speaking rate. Additional lexical cues are added to indicate certain
specific terms such as fractions and superscripts, in a similar fashion to Nemeth
MathSpeak [11].
Experiments with these systems have found that prosody can be used to determine
the structure of an equation and, significantly, required less effort from the user than
lexical cues [4]. Natural prosody would therefore appear to be highly suitable for the
presentation of structural information as the subtle prosodic inflections of natural
speech are intuitively understood and relatively undemanding in terms of cognitive
effort. The significant drawback to this approach is that the structure of complex,
nested equations is difficult to determine based on prosody alone, suggesting that
some form of additional delimiter is required [4, 15].
3 Prosody and Audio Punctuation, Earcons and Spearcons
The prosodic model developed by Fitzpatrick [15] based on the natural structures
found in spoken English appears to be the most cognitively efficient means of
conveying the structure of mathematical content via audition. However, in this new
model, the prosodic grouping is reinforced with a limited number of additional sounds
which function as a form of audio punctuation. These include simple, beep sounds
which are used as audio parenthesis to delimit sub-expressions and reinforce the
segmentation implied by the prosodic model. The additional lexical cues used by
Fitzpatrick to indicate fractions and exponents are also replaced with equivalent
spearcons whose shorter duration should reduce the negative influence of the suffix
effect. It is beyond the scope of this paper to describe how the prosodic model caters
for each individual construct found in mathematical material, however, the following
sections will describe how some of the most common terms and expressions will be
The prosody model developed by Fitzpatrick assumes that a mathematical
expression is a structure containing an arbitrary number of nested sub-expressions,
which can be resolved into either a sentence structure containing various clauses, or a
paragraph of sentences [15]. The degree of nesting in the equation is therefore highly
important, and once this factor has been determined, the appropriate pauses and
changes in speech rate can be determined and added to the basic verbalization of the
mathematical content, as illustrated by the following example.
In this equation, the summation which applies to both of the remaining terms
represents the first level of nesting. Two terms are contained within the summation,
however, the second of these terms itself contains two complex terms in the
numerator and denominator of the fraction, and this will require another level of
nesting. The representation of this equation must therefore include a lengthy pause to
indicate that the remaining terms are all contained within the summation, followed by
additional pauses to indicate the scope of the superscript and fraction. The
approximate spoken rendering of this equation would therefore be, “the sum from i
equals 1 to n minus 1 of…. a to the i.. plus begin fraction, i plus 1 over i minus 1, end
fraction”. Adjustments to the speaking rate are also applied at the various clause
boundaries to further reinforce the segmentation of the material. In the above
example, the complex terms contained within the fraction would be indicated by an
increase in the speaking rate relative the rest of the equation. However, these
adjustments must be carefully controlled to ensure that sub-expressions at deeper
nesting levels are not uttered too quickly. With this in mind, a 6% increase in
speaking rate was found to produce acceptable results for a number of nesting levels
The prosodic model described in the above example indicates the structure of the
equation in a highly intuitive manner, however clearly this approach will never be
entirely sufficient, as indicated by the presence of an additional lexical cue to indicate
the fraction. In this new model, lexical terms such as “begin fraction” are replaced
with spearcons, and these are used to support the prosodic segmentation of all
superscripts, subscripts and fractions. Both these spearcons and the simple, beep-like
earcons which function as audio parenthesis are applied to the prosodically enhanced
speech using a similar strategy which reinforces the segmentation implied with
prosodic parameters such as pauses and adjustments to the speaking rate. These short
beeps function as audio parenthesis in much the same way as brackets are used to
delimit equations in many programming languages. An opening audio parenthesis is
represented by a short beep, specifically a 20msecs sine wave pulse, which is
positioned to the left of the main voice signal using binaural processing. A closing
audio parenthesis is represented in the same way except the sine wave pulse is now
positioned to the right. This scheme significantly enhances the structural segmentation
implied by the speech prosody, and in addition, the extremely short duration and the
externalizing effect of the binaural processing minimizes the detrimental effect of
these additional sounds on the primary speech content. The use of spatial position to
indicate opening or closing parenthesis is also very cognitively efficient as it relies on
a relatively course and intuitive sense of spatial hearing. These two earcons were
designed to support this left-right distribution as the frequency of each sine tone
ramps up or down depending on whether it is an opening or closing parenthesis
respectively. This frequency ramp, or glissando, is not highly apparent due to the
quite short duration of the pulse, however, it does help to further distinguish the two
earcons and is conceptually suggestive of the implied opening and closing delimiters.
Multiple nested levels of audio parenthesis are created by adjusting both the spatial
position and the frequency range of the sine wave pulse.
Parenthesis can be used to remove many of the structural ambiguities which may
arise in spoken mathematics, however, the vertical layout of printed notation, with
fractions, subscripts and superscripts for example, must also be indicated using some
form of additional cue. In this case, the use of non-speech sounds will require the user
to learn the meaning of different abstract sounds, however, the use of lexical cues is
also problematic as the additional speech content may detract from the main spoken
content. For this reason, spearcons based on the phrases “frac”, “sup” and “sub” are
used to indicate fractions, superscripts and subscripts respectively. In each case, the
spearcon is presented using the same left-right, binaural processing to indicate the
beginning or ending of the fraction, superscript or subscript. This approach is
advantageous as the original phrase provides a lexical indicator of the type of
structure while the significantly reduced duration (time compressed to approx.
200msec) and simplified phrasing should also lessen the impact of the suffix effect. In
addition the use of spatial position to indicate the beginning or end of a sub-
expression further reduces the duration and detrimental effect of required lexical cue
4 Conclusion
Working with mathematical material via audition rather than vision places an
increased cognitive load on the listener. This is largely unavoidable due to critical
importance of spatial structures in mathematical notation and the significant
difference between audition and vision in terms of spatial resolution. The design of a
system for visually impaired users must therefore concentrate on resolving structural
ambiguities in a cognitively efficient manner. In addition, the system must provide the
user with temporal control over the material in the form of browsing and overview
capabilities, so the user can direct their own path through the material at their own
In his PhD thesis on the AsTeR system, T.V. Ramen stressed the importance of
identifying dimensions in the audio space to parallel the functionality of the
dimensions in the visual setting [2]. This does not imply that these dimensions should
be simply mapped from one modality to another, instead it is the functionality of these
dimensions that should be replicated between these two modalities. The spatial layout
of printed mathematical material should not therefore be simply replicated with
sound. Instead the functionality of this layout must be provided via audition using a
method that is appropriate for this modality. The human auditory system is
particularly attuned to speech and this suggests that the most cognitively efficient
means of presenting mathematical structural information via audition is through the
replication of natural speech patterns, such as prosody. However, prosody alone
cannot entirely resolve structural ambiguities and so additional cues are required.
While non-speech sounds such as earcons can be constructed to represent a
hierarchical structure (such as nested parenthesis or menu items) this will require
additional cognitive processing on the part of the listener which may only distract the
processing of the mathematical material, and this is particularly true of complex
schemes based upon musical structures. The audio punctuation strategy proposed in
this paper overcomes this issue, as in this instance it is not the actual audio content of
the earcon that is important, but rather its coarse spatial position (simply either to the
left or to the right) relative to the main speech signal. In this way, the earcons function
as audio parenthesis which punctuate and augments the speech content in an intuitive
and non-distracting manner.
Spearcons are an interesting new development as they lie somewhere between a
clearly comprehensible spoken utterance and an abstract non-speech sound.
Spearcons are therefore an excellent way to indicate structural elements such as
fractions, superscripts and subscripts as they are less distracting than tradition lexical
cues but still provide a description of the particular structural element involved.
1. Gillan, D., Barraz, P., et al: Cognitive Analysis of Eqauation readings: Application to the
development of the MathGenie. ICCHP 2004, LNCS, vol. 3118. pp. 630-637, Springer,
Heidelberg (2004).
2. Raman, T. V: Audio Systems for Technical Reading. PhD Thesis, Department of Computer
Science, Cornell University, Ny, USA, May 1994.
3. Gaura, P.: REMathEx – Reader and Editor of the Mathematical Expressions for Blind
Students. ICCHP 2002, LNCS, vol. 2398. pp. 486-493, Springer, Heidelberg (2002).
4. Stevens, R. D..: Principles for the Design of Auditory Interfaces to Present Complex
Information to Blind Computer Users. PhD Thesus, University of York, UK (1996).
5. Harling, P. A., Stevens, R. and Edwards, A.: Mathgrasp: The design of an algebra
manipulation tool for visually disabled mathematicians using spatial-sound and manual
gestures. HCI Group, University of York, UK (1995).
6. Grantham, D. W.: Detection and discrimination of simulated motion of auditory targets in
the horizontal plane. J. Acoust. Soc. Am. vol. 79, pp. 1939–1949. (1986)
7. Hollander, A. J. and Furness, T. A.: Perception of Virtual Auditory Shapes. Proceedings of
the International Conference on Auditory Displays. November, 1994.
8. Goose, S. and Möller, C.: A 3D Audio Only Interactive Web Browser: Using Spatialization
to Convey Hypermedia Document Structure. Proceedings of the ACM International
Conference on Multimedia, pp. 363–371, Orlando, USA (1999).
9 Begault, D. R. & Erbe, T. R.: Multi-channel spatial auditory display for speech
communications. Audio Engineering Society 95th Convention, Preprint No. 3707 (1993).
10 Crispien, K. and Petrie, H.: Providing Access to Graphical-Based User Interfaces for Blind
People: Using Multimedia System Based on Spatial Audio Representation. 95th AES
Convention, J. Audio Eng. Soc, (Abstracts), vol. 41, pp. 1060 (1993).
11 Nemeth, A.: Abraham Nemeth’s Anthology on the Unified Braille Code. Available at arm4r/nemeth
12 Baddeley, A.: Human Memory: Theory and Practice. London: Lawrence Erlbaum
Associates (1990).
13 Walker, B., Nance, A. and Lindsay, J.: Spearcons: Speech-based Earcons Improve
Navigation Performance in Auditory Menus. Proceedings of the 12th International
Conference on Auditory Display, London, UK, June 20-23 (2006).
14 Edwards, A. D. N.: Using sounds to convey complex information. in A. Schick and M.
Klatte (ed.), Contributions to Psychological Acoustics: Results of the Seventh Oldenburg
Symposium on Psychological Acoustics, Oldenburg, pp. 341-358 (1997).
15 Fitzpatrick, D.: Mathematics: how and what to speak. ICCHP, Springer Verlag, pp. 1199-
1206 (2006).
... In this way, sighted users can utilize printed material as a form of external memory. As a result, they do not need to memorize the structure and layout of an equation [16]. ...
... Although this type of lexical cues has proved to improve the comprehension of the math expressions studied in real applications [19], [20], it has also shown to add confusion to the user as the equations become more complex. This is because if the end of an utterance is marked with a verbal message, this message may detract from the listener's memory of the utterance, a process which is referred to as the suffix effect [16]. As a result, the cognitive effort required to remember and parse the equation is significant for all but the simplest expressions. ...
... The use of prosodic cues has demonstrated to help determine the structure of an equation and, when it comes to interpreting the information, it requires less cognitive effort from the user than lexical cues. However, the structure of complex, nested equations is difficult to determine based on prosody alone, suggesting that some form of additional delimiters are required [16]. ...
Full-text available
There is no evidence that mathematical semantics cannot be understood due to blindness, the problem is the current access barrier to mathematical resources. In light of this problem, this survey aims at providing visually impaired persons (VIPs), or people close to them, with an overview of the currently available software tools for approaching mathematical content. These can be categorized into (a) tools for accessing mathematical documents (where VIPs are just consumers of content), and (b) tools that allow VIPs to become the creators of mathematical content and even to execute mathematical operations. We also explain the advantages and disadvantages of several key technologies used to interact with mathematics. Moreover, we discuss the necessity of the most common formats and languages behind these tools. Finally, we outline promising paths for future research and development towards blind-friendly mathematical resources. The authors hope that this survey may encourage researchers to engage with the still unsolved challenges of this topic.
... The MathML language made possible advances significant in speech synthesis technologies [4]. The use of lexical cues to delimit mathematical structures, such as the "beginning of fraction" and "end of fraction", made it possible to reduce ambiguities of the expressions [5]. The intra-expression navigation mechanism enabled students to explore and access the parts of a mathematical expression [1]; this process helped to create a better mental image and reduce cognitive overload. ...
... Prosody is a set of patterns for the enhancement, accentuation, and intonation of a language. In Mathematics, it is still necessary to create models to produce the appropriate prosodic variations in the synthesized speech of this type of content [4,5]. This article presents a prosodic model proposal to improve the synthesized speech of mathematical expressions in MathML. ...
... In [5] a structure to spatially represent mathematics using prosody, earcons, spearcons, and sound spatialization was proposed. Their proposal aims to provide the temporal control of mathematical expressions. ...
Conference Paper
Full-text available
The use of the MathML language made it possible to improve the accessibility of mathematics for blind or low-vision persons in digital media. Synthetic speech technologies have advanced significantly using MathML, however, the speech synthesizers’ standard reading style is still not suitable for mathematics. Making mathematical reading of the speech synthesizers more natural and expressive is still a challenge. The creation of models to produce the appropriate prosody in the synthesized speech of math content is therefore necessary, as shown in previous research. This article presents a proposal for a model to improve prosody in the synthesized speech of mathematical expressions based on MathML. A corpus of mathematical expressions spoken by Mathematics teachers was created to support the model’s development. The Fujisaki intonation model was adopted for intonation control, accent and phrase commands have been extracted from the corpus, and some adjustments have been made to manipulate prosodic parameters in the speech of mathematical expression in correlation with the MathML tree; additionally, a pattern of pauses control is being created
... Intraexpression navigation mechanism [5] allowed the students to navigate and access parts of a mathematical expression, facilitating the creation of the mental image and reduction of cognitive overload. The use of prosodic components, along with earcons and spearcons, made mathematical expressions more intuitive and less ambiguous [6]. Despite these advances, different from textual approach, the speech synthesizers do not read mathematical expressions with correct prosody 1 , this problem contributes to cognitive overload in the students and incorrect interpretation of mathematical expressions [7]. ...
... Four students received some support at their school, and 2 received some support at a specialized center for visually impaired. 6 7 ...
Conference Paper
Full-text available
The study of Mathematics presents several additional challenges for visually impaired students, among them, we can highlight the lack of accessibility of mathematical contents. For this reason, many students are prevented from reaching the desired and possible levels of proficiency in this area. This present research aimed to update the knowledge of the associated media and technologies, besides braille, which has been used for the teaching-learning process of Mathematics with visually impaired students native in the Portuguese language, as well as to know the specific difficulties faced. This study was centered in the reality of Brazil and was divided into two stages, in the first one, braille teachers and visually impaired students were interviewed, and in the second stage, mathematics expressions spoken by speech synthesizers were analyzed. Although, in the English-speaking countries, there are several technologies used besides braille, in Brazil mathematics braille is the most used, because there are no other more agile technologies in Portuguese. We observed that there are still problems in the mathematical expressions spoken by synthesizers, and many of these problems have already been solved in the English language.
... Mecanismo de navegação intra-expressão [5], possibilitou ao estudante a navegabilidade e acesso também pelas partes de uma expressão matemática, processo este que favoreceu a criação da imagem mental e uma redução da carga cognitiva. O uso de componentes prosódicos conjuntamente com earcons e spearcons espacializados também tornaram as expressões matemáticas mais intuítva e menos ambígua [6]. Apesar destes avanços, diferentemente da leitura textual que se aproxima da fala humana, os sintetizadores de voz não reproduzem ainda uma leitura com a prosódia 1 correta das expressões Matemáticas, problema este que contribui para uma sobrecarga na memória do estudante e interpretação incorreta das expressões [7]. ...
Full-text available
O estudo da Matemática apresenta diversos obstáculos adicionais aos estudantes com deficiência visual, dentre os quais, podemos citar a falta de acessibilidade dos conteúdos matemáticos. Por esta razão, muitos estudantes encontram-se impedidos de alcançar os níveis de proficiência possíveis e desejados nesta área. A presente pesquisa teve como objetivo atualizar o conhecimento dos meios e tecnologias associadas, além do braille, que têm sido utilizados para o processo de ensino-aprendizagem da Matemática com estudantes com deficiência visual nativos na língua portuguesa, assim como conhecer as dificuldades específicas que são enfrentadas. O estudo foi centrado na realidade do Brasil e dividido em duas etapas, na primeira foram entrevistados transcritores de braille e estudantes com deficiência visual e na segunda foram analisadas expressões matemáticas faladas por leitores de ecrã. Embora nos países de língua inglesa, existam diversas tecnologias que são utilizadas na Matemática além do braille, no Brasil, o braille matemático é o meio maioritariamente usado, pois não se conhecem outras tecnologias mais ágeis desenvolvidas em língua portuguesa. Foi observado que ainda há problemas na entrega auditiva de expressões matemáticas quando faladas pelos leitores de ecrã, problemas esses que já foram em grande parte solucionados na língua inglesa.
... Some of them also try to optimize cognitive load and verbosity. The various proposed cues include lexical [2], prosodic [8], earcons [9], spearcons [1], audio spatialization [7], and auditory [6], etc. Their comparison in terms of ambiguity, verbosity, cognitive load and naturalness on the basis of various user studies [7,9,3,4,6] can be found in Table 1. ...
Conference Paper
Persons with visual impairments can access digital information through text-to-speech based screen reading software. Still, accessing mathematical equations are challenging due to non-linearity. We propose to improve accessibility of equations by associating a complexity metric with each equation and then use this metric to modify the rendering. For example, this will allow audio rendering of complex equations with appropriate variable substitution. Similarly, different choices of cues can be adopted on the basis of complexity of the equation. Further, personal optimization can be done through cognitive analysis of listening ability, familiarity with the content and educational background of the user.
... Embora a leitura de equações matemáticas tenha obtido avanços significativos nos últimos anos e os sistemas TTS tenham melhorado consideravelmente, eles raramente produzem o som natural da fala e ainda não parecem falantes humanos. Alcançar uma boa prosódia para a redução da sobrecarga cognitiva na leitura de equações complexas, continua sendo objeto de pesquisa (Bates e Fitzpatrick, 2010). Para a área da geometria, não foram encontradas tecnologias que exploram sentidos multimodais (háptico e auditivo) o que evidencia oportunidades de pesquisa nessa área. ...
Conference Paper
Full-text available
O currículo das STEM (Ciência, Tecnologia, Engenharia e Matemática) para os estudantes com deficiência visual apresenta várias limitações, tais como: a ampla simbologia, imagens, gráficos, diagramas, dentre outros tipos de instrução da Matemática que não são lidos e interpretados corretamente pelos leitores de telas, dificultando a compreensão e entendimento desses conteúdos. Este artigo apresenta uma revisão sistemática da literatura, que investiga tecnologias assistivas para apoiar o ensino e aprendizagem na Matemática desses estudantes. A revisão seguiu as diretrizes propostas para esse tipo de pesquisa. Os resultados apresentam 59 tecnologias que possibilitam explorar diversos sentidos: tátil, auditivo, háptico e multimodal em diversos conteúdos da Matemática. As lacunas identificadas abrem oportunidades para futuras pesquisas, pois mostram a existência de limitações que impossibilitam os estudantes com deficiência visual alcançarem um nível adequado de conhecimento na Matemática
Persons with visual impairments can access digital information through screen reading software. Still, accessing mathematical equations is challenging due to their non-linearity. We propose to improve the accessibility of equations by associating a cognitive complexity metric with each equation and then to use this metric to modify their audio rendering. For example, this will allow audio rendering of complex equations by substituting chunks with variables. This will potentially reduce the cognitive load in comprehending equations by persons with visual impairments. Further, this metric needs to be personalized on the basis of various user characteristics.
Full-text available
The biggest barrier for visually impaired people to pursue a bachelor of science degree is not the blindness itself but the access to mathematical resources. Resources such as Computer Algebra Systems (CAS) are not accessible, which means that even the execution of elementary math becomes a challenging task. In this paper, we present Casvi, a CAS for visually impaired people, which allows to perform symbolic and numeric computation using the Maxima’s math engine. Casvi offers modules for algebra, linear algebra, differential calculus, integral calculus among others. Moreover, it provides an intuitive user interface based on synthetic speech and non-speech sounds.
Conference Paper
Full-text available
Conference Paper
Full-text available
Access to mathematical content for blind and vision impaired people continues to be a problem. The inherently visual nature of this form of presentation is neither easily or readily accessible using the linear representations in common usage by this community. This paper proposes methodology for depicting mathematics in a non-visual manner. It will be shown how, through the prosodic component found in spoken language, the structure of mathematical formulae may be disambiguated. We will also discuss lexical cues which can be added to the utterance to further reduce the ambiguity which can be very evident in this form of material.
Conference Paper
Full-text available
Can information about the perceptual and cognitive processes involved in equation reading be applied in the creation of assistive technology for blind equation readers? The present research used four cognitive/perceptual studies to examine several hypotheses about equation reading: people (1) read equations from left to right, one element at a time, (2) back scan when reading equations, (3) substitute the outcome of a parenthetical expression for the initial elements, and (4) scan the entire equation before element by element reading to create a schematic structure. The process tracing study provided evidence for all of the hypotheses, with three experiments supporting the first three hypotheses, but not the fourth. These results have been implemented in assistive software for visually-impaired users, the Math Genie – an auditory browser.
Full-text available
Thesis (M.S.E.)--University of Washington, 1994. Includes bibliographical references (leaves [93]-99).
With shrinking displays and increasing technology use by visually impaired users, it is important to improve usability with non-GUI interfaces such as menus. Using non-speech sounds called earcons or auditory icons has been proposed to enhance menu navigation. We compared search time and accuracy of menu navigation using four types of auditory representations: speech only; hierarchical earcons; auditory icons; and a new type called spearcons. Spearcons are created by speeding up a spoken phrase until it is not recognized as speech. Using a within-subjects design, participants searched a 5 x 5 menu for target items using each type of audio cue. Spearcons and speech-only both led to faster and more accurate menu navigation than auditory icons and hierarchical earcons. There was a significant practice effect for search time, within each type of auditory cue. These results suggest that spearcons are more effective than previous auditory cues in menu-based interfaces, and may lead to better performance and accuracy, as well as more flexible menu structures.
Conference Paper
Interactive audio browsers provide both sighted and visually impaired users with access to the WWW. In addition to the desktop PC, audio browsing technology can be deployed that enable users to browse the WWW using a telephone or while driving a car. This paper describes a new conceptual model of the HTML document structure and its mapping to a 3D audio space. Novel features are discussed that provide information such as: an audio structural survey of the HTML document; accurate positional audio feedback of the source and destination anchors when traversing both inter-and intra-document links; a linguistic progress indicator; the announcement of destination document meta-information as new links are encountered. These new features can improve both the user's comprehension of the HTML document structure and their orientation within it. These factors, in turn, can improve the effectiveness of the browsing experience.
Conference Paper
The project REMathEx is developed at the Faculty of Informatics, Masaryk University, Brno as a support for blind students allowing them to study complex mathematical expressions. The system uses the combination of the braille display and the speech synthesis outputs to provide the user with all the information concerning studied mathematical expressions. The basic principles of the system and its use are described in the paper.
Thesis (Ph. D.)--Cornell University, May, 1994. Includes bibliographical references (p. 157-166).