ArticlePDF Available

Effect of Auditory Waiting Cues on Time Estimation in Speech Recognition Telephony Applications

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:
  • Melanie Polkosky LLC
  • MeasuringU

Abstract and Figures

Previous empirical research in subjective time estimation and applied work in auditory interface design imply that designers can use auditory stimuli during system processing to manipulate users' perception of its duration. Two experiments investigate the effect of system response time (SRT) duration and rate of change of an auditory waiting cue on participants' subjective time estimates and perceived affect. The results showed that perceived SRT duration and ratings of perceived anxiety, stress, and impatience increased as ticking rate increased. However, with a slow rate (2-sec ticking), participants underestimated the duration of SRT, but indicated a significant increase in negative affect as compared with silent conditions. These results suggest that interface designers may reduce the subjective duration and negative affective states of SRT through carefully chosen, slow tempo system processing tones. The results of this research also stress the importance of thoughtful, informed interface design that makes contact with the empirical literature of the cognitive sciences.
Content may be subject to copyright.
Effect of Auditory Waiting Cues on Time Estimation
in Speech Recognition Telephony Applications
Melanie D. Polkosky
James R. Lewis
IBM Voice Systems
Previous empirical research in subjective time estimation and applied work in audi-
tory interface design imply that designers can use auditory stimuli during system pro-
cessing to manipulate users’ perception of its duration. Two experiments investigate
the effect of system response time (SRT) duration and rate of change of an auditory
waiting cue on participants’ subjective time estimates and perceived affect. The results
showed that perceived SRT duration and ratings of perceived anxiety, stress, and im-
patience increased as ticking rate increased. However, with a slow rate (2-sec ticking),
participants underestimated the duration of SRT, but indicated a significant increase
in negative affect as compared with silent conditions. These results suggest that inter-
face designers may reduce the subjective duration and negative affective states of SRT
through carefully chosen, slow tempo system processing tones. The results of this re-
search also stress the importance of thoughtful, informed interface design that makes
contact with the empirical literature of the cognitive sciences.
1. INTRODUCTION
The empirical literature has demonstrated that system response time (SRT) is a
component of human–computer interfaces that can dramatically affect user accep-
tance of an application (Shneiderman, 1984). SRT is the time required for a com-
puter to receive a user’s input, process the response, and send a reply back to the
user (Thadhani, 1981). During SRT, the user waits for system processing to finish
while monitoring the system’s task.
The bulk of research on SRT dating from the 1960s (Nickerson, 1969) addresses
the response delays of mainframe and desktop computers (Jacko, Sears, & Borella,
2000). More recently, as newer technologies have emerged, studies investigating
system processing delays associated with the World Wide Web (Jacko, Sears, &
Borella, 2000; Ramsay, Barbesi, & Preece, 1998), networks (Roast, 1998), and virtual
reality applications (Watson, Walker, Ribarsky, & Spaulding, 1998) continue to
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION,
14
(3&4), 423–446
Copyright © 2002, Lawrence Erlbaum Associates, Inc.
Requests for reprints should sent to: Melanie D. Polkosky, 5081 Congress Avenue, Suite 2207, Boca
Raton, FL 33487. E-mail: polkosky@us.ibm.com
demonstrate that extended SRTs disrupt users’ experiences with myriad applica-
tions. Limiting the duration of SRT is no less relevant for user acceptance of interac-
tive voice response interfaces.
The use of interactive voice response (IVR) interfaces, especially in telecommuni-
cation applications, has grown significantly over the past decade (Spiegal & Streeter,
1997). Although these interfaces allow the user to interact in the more familiar and
natural mode of spoken communication, the problem of system processing delay
continues to be a matter of debate. Kamm and Helander (1997) contended that
with continuing advances in processor speed and in the efficiency of recognition search
and language processing algorithms, near real-time system response is becoming feasi-
ble even for complex speech understanding tasks, so SRT may cease to be a significant
interface issue. (p. 1048)
L. Miller and Thomas (1977) made a similarly optimistic prediction nearly 30 years
ago, suggesting that technological advances would make SRT concerns obsolete
for desktop applications. However, the mountainous volume of literature on SRT
and its effect on users is a clear reminder that SRT is problematic for desktop com-
puter users even today. Researchers and designers of the most recent technologies
recognize SRT as a by-product of the interaction among complex processing tasks
such as speech recognition and multimedia retrieval, hardware capabilities, and
the ever-increasing demands of multiple simultaneous users (Balentine & Morgan,
1999; Jacko et al., 2000). If hindsight is any guide, SRT will continue to be a signifi-
cant and vexing issue for interface designers, even as IVR systems mature.
Telephony applications are particularly susceptible to SRT delays. They employ
verbal interaction (by both the application and user) via a telephone for informa-
tion retrieval and transaction services, including “remote banking, travel reserva-
tions, information inquiry, stock, mutual fund, and other financial transactions, in-
ternational calling, [and] credit card verification” (Balentine & Morgan, 1999, p. 1).
In a typical telephony interaction, a remote server hosts the application used for the
transaction task, and the SRT delay includes a series of complex processing tasks:
recognizing the users’ spoken request, connecting to the server, sending data over
wireless or other networks, retrieving the requested information, returning data
through the network, and generating a synthetic spoken message that is finally pre-
sented to the user. This delay may be prolonged even more with heavy network
traffic, large data files, and telephones, networks, or servers that are not
state-of-the-art in processing capacity. Therefore, the simple user task of checking a
bank balance over the telephone can combine all of the most complex and proces-
sor-intensive demands possible with the current limitations of technology.
Telephony applications also represent the most constrained and difficult IVR de-
sign environment. Visual cues and feedback to the user are significantly limited by
the telephone itself, which was originally intended for conversation between hu-
mansbutadaptedtocomplexinformationretrievaltaskspreviouslyperformedwith
benefit of a visual display. Increasing demand for ubiquitous computing suggests
that small, portable telephones will continue to be the preferred design, precluding
extensive visual cuing or feedback. Telephone portability also increases the likeli-
hood that users will call for urgently needed information in a wide variety of noisy,
424 Polkosky and Lewis
dynamic, distracting, and stress-producing environments, diminishing their toler-
ance for waiting on the phone. For users, speech is a natural, “intelligent” interface,
and the telephone is a familiar device. This powerful combination of technological
sophistication and familiarity is likely to further elevate users’ expectations and
make them even less tolerant of long delays. Balentine and Morgan (1999) confirmed
that minimal design flaws in other applications become “insurmountable” with “no
margin of error” in telephony interfaces (p. 2). Finally, because the technology is
emerging, there are few guidelines and little applied research to guide the user-cen-
tered development process.
Given that system response delays will be a reality of telephony applications into
the foreseeable future, the interface design community must continue to find meth-
ods of managing this aspect of the user experience. In two studies, we begin to con-
sider how designers can alter user perception of SRT duration using auditory pro-
cessing tones in speech recognition telephony applications. We review the literature
onSRTeffectsonusers,subjectivetimeestimation,andauditory interface design as a
basis for functional interventions during SRT in the unique and emerging context of
telephone-based interaction.
1.1. SRT Effects on Users
SRT duration affects the user in a variety of ways, although the previous literature
presents an inconsistent picture about the relation between SRT and individual out-
comes. Most studies have shown these effects are a result of SRT magnitude, with
long SRTs causing the most dramatic user consequences. As SRT magnitude in-
creases, user response time also increases (Barber & Lucas, 1983; Butler, 1984;
Thadhani, 1981). However, other studies dispute this finding (Dannenbring, 1983;
Goodman & Spence, 1982; Kuhmann, Boucsein, Schaefer, & Alexander, 1987). Not
only the magnitude of SRT is problematic for users: Highly variable SRT durations
are disruptive because they prevent the user from effectively dividing attention be-
tween SRT monitoring and a competing task (Galloway, 1981; L. Miller & Thomas,
1977; Murray & Abrahamson, 1983). Work quality and productivity also are affected
by SRT, apparently decreasing as SRT duration increases (Dannenbring, 1983;
Kuhmannet al., 1987), but the relation may not be a simple one (Barber & Lucas,1983;
Martin & Corl, 1986; Weiss, Boggs, Lehto, Shodja, & Martin, 1982), nor occur for all
tasks (Butler, 1984). Kohlisch and Kuhmann (1997) showed that short SRTs result in
poor performance and increased cardiovascular activity in users, which the re-
searchers suggested was an indication of inadequate task readiness. However, they
found that long SRTs created boredom (Kohlisch & Kuhmann, 1997). SRTs can also
induce stress or other negative emotional states in users, particularly frustration and
irritation (Guynes, 1988; Schaefer, 1990; Schleifer & Amick, 1989). Two studies sug-
gest that SRTs also produce anxiety (A. Eisler & Eisler, 1994; Guynes, 1988). Finally,
several studies provide evidence of somatic complaints and physiological changes
that occur due to SRT (Kolisch & Kuhmann, 1997; Kuhmann et al., 1987; Thum,
Boucsein, Kuhmann, & Ray, 1995).
In addition to negative personal outcomes, it seems reasonable that SRTs can im-
pact users’ overall perception of an interface. Indeed, the literature has shown that
Effect of Auditory Waiting Cues 425
user acceptability varies based on a number of factors beyond the magnitude and
variability of SRT (Shneiderman, 1984). Galloway (1981) suggested that several fac-
tors influence user acceptance, including whether (a) repeated SRTs interrupt the
pace of work (by requiring users to switch attention between the computer and a
primary work task), (b) SRTs occur at major breaks in work, (c) information must be
retained in memory throughout the SRT, or (d) the user perceives the SRT as appro-
priate to the task performed by the system. Studies of decreasing user satisfaction
and acceptability with SRT (Barber & Lucas, 1983) have led some researchers to rec-
ommend maximum SRT for specific applications (Johansson & Aronsson, 1984).
More recently, Jacko et al. (2000) found a statistically significant interaction be-
tween network delay (short, medium, and long) and document type (text only or
text and graphics) affected perceived usability of internet Web sites, as measured
by perceived quality of information at the site, information organization, and likeli-
hood of recommending the site to others. They concisely interpreted their findings
as showing
a trade off exists between the type of media used and the delays users experience. When
the delays are short enough, users prefer documents that include graphics. However, as
delays increase, graphics are viewed as contributing to the delay and simpler text-only
documents are preferred. (p. 438)
The research suggests that SRT causes both broad and primarily negative user
outcomes, often interacting with user and other system variables. These negative
user outcomes may also influence the user’s impression of the interface as a whole.
To avoid these consequences, it is essential that application designers implement
interfaces that effectively minimize users’ perception of SRT duration.
1.2 User Perception of Time
The acceptability of SRT, and possibly an entire application, relates at least partially
to a user’s perception of time. A significant cognitive-psychological literature exists
on the topic of subjective time estimation and perception (for a review, see Fraisse,
1984). This literature indicates that the subjective experience of time is a power func-
tion of actual time duration multiplied by a constant (subjective time estimate = (ac-
tual time)0.9*C); H. Eisler, 1976). If, as H. Eisler suggested, the exponent of the subjec-
tive time function is 0.9, users’ perception of SRT increasesapproximately linearly as
actual SRT duration increases (Meyer, Shinar, & Leiser, 1990). However, a number of
independent variables can affect the function’s exponent, as argued by H. Eisler.
Researchers have suggested that several individual variables influence subjec-
tive time estimation, including age (Block, Zakay, & Hancock, 1998; Craik & Hay,
1999), intelligence (Fink & Neubauer, 2001), personality (Carrasco, Guillem, &
Redolat, 2000; Rammsayer & Rammstedt, 2000; Zakay, Lomranz, & Kaziniz, 1984),
gender (A. Eisler & Eisler, 1994), attention (Zakay, 1989), and memory (Fraisse,
1984). More recently, studies of individuals with various speech or cognitive im-
pairments provide additional support for the influence of underlying cognitive
mechanisms on subjective time estimation (Ezrati-Vinacour & Levin, 2001;
426 Polkosky and Lewis
Hellstrom & Almkvist, 1997; Hellstrom, Lang, Portin, & Rinne, 1997; Lange, Tucha,
Steup, Gsell, & Naumann, 1995; Nichelli, Venneri, Molinari, Tavani, & Grafman,
1993; Riesen & Schnider, 2001). Researchers implicate memory (Ornstein, 1969;
Poynter & Homa, 1983) or attention (Zakay, 1989) as the central cognitive process of
subjective time estimation, although empirical support for specific theoretical
models has been inconsistent and controversial (Meyer, Shinar, Bitan, & Leiser,
1996; Zakay, 1993).
Perhaps of more direct relevance to application designers, research demonstrates
that external variables, especially sensory stimuli, can alter the subjective experience
of time. Zakay, Nitzan, and Glicksohn (1983) examined the effect of sensory stimulus
and task difficulty on subjective time estimation. Ninety-six participants estimated
the length of empty intervals or intervals filled with a fast or slow tempo (manipu-
lated with a flickering light bulb or electronic buzzer). They found that a fast tempo
resulted in the longest time estimates, whereas a slow tempo resulted in the shortest
time estimates (no tempo produced intermediate time estimates). Yoblick and
Salvendy(1970)foundthatwhenparticipantsreproducedfilled time intervals (audi-
tory tones, visual flicker, or tactile vibrations), they overestimated the duration of
lower frequencies significantly more often than higher frequencies only with audi-
tory stimuli. When the time intervals were filled with visual or tactile stimuli, partici-
pants estimated high and low frequencies similarly.1Glicksohn (1992) studied the ef-
fect of an altered sensory environment on subjective time estimation ina3×4×4
factorialdesign using 96 participants (8 participants per cell). He exposed the partici-
pants to a combination of visual stimulation (visual overload, visual deprivation, or
reading with normal room lighting) and auditory stimulation (dichotic music pre-
sentation, different music presented to each ear simultaneously; stereophonicmusic
presentation, the same music presented to both ears simultaneously; white noise; or
no auditory stimulation) for 20 min prior to estimating intervals of 4, 8, 16, and 32 sec.
Participants’ estimates without prior sensory stimulation were used as covariates.
Results indicated only a significant interaction for four conditions involving visual
deprivation plus dichotic listening or white noise and visual overload plus dichotic
listening or white noise. Therefore, dichotic music–visual overload and visual depri-
vation–white noise led to inflated time estimates but visual deprivation–dichotic
music and visual overload–white noise led to depressed time estimates.
Although these laboratory studies suggest that the auditory environment can
influence subjective time estimation, the results are only indirectly applicable to in-
terface design. Zakay et al. (1983) included task difficulty as an independent vari-
able, adding a laboratory-based, contrived task that would not be typical of users of
telephone-based voice interfaces. In addition, the experimental designs only hint at
the idea that, by controlling the sensory environment, interface designers can alter
a users’ perception of time. A stronger design in an applied setting is needed to val-
idate this tantalizing proposal.
Effect of Auditory Waiting Cues 427
1The finding of no effect in this study may have been related to the differencein experimental method,
also a topic of debate in this literature. Yoblick and Salvendy (1970) required participants to reproduce a
time duration they considered equivalent to the experimental period, then measured the duration of that
reproduction.Zakay,Nitzan,& Glicksohn (1983) requiredparticipants to recall theexperimentalduration
and provide a verbal estimate of its duration.
1.3. Previous Research in Human–Computer Interaction (HCI)
Parallel research in HCI has explored similar issues as the cognitive psychological
literature; however, the primary thrust in the applied setting has been to determine
how SRT magnitude and duration affect users’ perceptions of an interface. A sec-
ond line of research has explored the design of visual displays that minimize the
perceived duration of time (Block, 1990; Levin & Zakay, 1989; Meyer, Bitan, &
Shinar, 1995). Few of these studies offer clues to adapting the findings to other sen-
sory modalities, such as auditory presentation.
A notable exception is the work of Meyer, Shinar, and Leiser (1990), who exam-
ined the effect of wait messages on participants’ estimates of 3- to 16-sec SRTs. The
wait messages were static (blank screen, printed “please wait,” six-word printed
epigram) or dynamic (increasing line of printed X letters, round clock drawing,
blinking printed “please wait”), with dynamic messages presented at three rates of
change (changing every .33, .50, or .67 sec). Results showed no difference in time es-
timates for the three static displays and the blinking please wait message. The dy-
namic displays that changed over time (line of Xs and clock) resulted in longer time
estimates when rates of change were faster. This result provides additional, applied
support for the Zakay et al. (1983) finding that an external tempo influences subjec-
tive time estimation.
A third, related area of HCI research has addressed auditory waiting cues. These
cues play during SRT and identify that system processing is occurring, while si-
multaneously informing the user that the system has not disconnected (Balentine
& Morgan, 1999). Several researchers describe the use of waiting tones during SRT
without determining their effect on user time perception or SRT acceptability.
Beaudouin-Lafon and Conversy (1996) explained their use of Sheppard–Risset
tones (sounds that appear to go up and down indefinitely) as audio progress bars.
Albers and his colleagues (Albers, 1996; Albers & Bergman, 1995) used a ticking
sound for relative transfer time in a World Wide Web browser and in a satel-
lite–ground control application (Albers, 1995). Albers (1995) also used “pops and
clicks” to indicate data transfer. Similarly, Balentine & Morgan (1999) advocated
the use of “low level ticking” or “pitched wait tones” to indicate the user’s need to
wait for system processing in telephony applications.
Buxton (1989) identified an analogous monitoring function of auditory cues in
interface design, which few other researchers have explored empirically.
Rauterberg (1998) used six machine sounds to assist operators in monitoring a sim-
ulated plant. The results showed that auditory cues significantly improved the op-
erators’ productivity scores, number of status reports, self-assurance, and social ac-
ceptance as compared with a condition without these cues. This study provides
some limited empirical evidence that auditory cues may have performance and
psychological benefits when individuals are monitoring system processes.
1.4. Auditory SRT Cues in Telephony Applications
Telephone-based voice interaction presents a unique design environment. Al-
though the past studies of SRT have used primarily visual applications (desktop
428 Polkosky and Lewis
computer), telephony applications demand an auditory cue because they do not
typically provide a visual display (Schumacher, Hardzinski, & Schwartz, 1995).
The use of auditory stimuli to signal SRT and its completion may have several addi-
tional cognitive-psychological advantages for users. Complex sounds draw atten-
tion, especially when they are changing; and designers can use them to shift lis-
tener attention (Moore, 1989). Gaver (1997) noted that sound is generally effective
at conveying information about processes. If a system provides a processing tone
for users, they may be able to divide attention to continue with an ongoing task and
monitor the SRT, thereby limiting work interruption. Similarly, the end of the tone
may shift the user’s primary attention back to system interaction. The particular
sound itself may also alter a listener’s mood or emotional state (Gaver, 1997). Per-
haps the appropriate use of sound in interface design can counteract or reduce the
negative outcomes associated with SRT by the previous literature.
In two experiments, we expand on previously applied research in HCI and em-
pirical work on subjective time estimation. Because previous studies involving SRT
have primarily addressed visual wait signals (Meyer et al., 1996; Meyer et al., 1990),
we investigated the use of three rates of auditory processing tone on users’ subjec-
tive time estimation. Yoblick and Salvendy (1970) also investigated the effect of au-
ditory tones on subjective time estimation, but their tones were unchanging over
time (36 sinusoidal waves ranging from 80 Hz–14,000 Hz across 36 experimental
conditions) and did not represent SRTs. Zakay et al. (1983) provided auditory stim-
uli using a buzzer that “flickered” with durations of 0.5 sec or 2.0 sec during 14-sec
verbal tasks. Therefore, participants engaged in a competing task in this study (as
opposed to simply waiting during SRT), and the length of the tasks remained con-
sistent. They used a between-subject design (six participant groups of fast visual,
fast auditory, slow visual, slow auditory, and control), which did not allow compar-
ison of a single participant’s time estimates with different rates of external stimuli.
In addition, the researchers reported post hoc comparisons only between the mean
for the three verbal tasks and a condition with no verbal task (three mean compari-
sons within slow external tempo), leaving the differences between the slow groups
and control unclear.
In contrast, this study investigated three rates of dynamic auditory stimuli occur-
ring during realistic SRT durations. The repeated measures design also allows us to
compare participants’ time estimates among conditions of the independent variables.
2. EXPERIMENT 1
Our initial question was deceptively simple: Can we manipulate user perception of
SRTdurationin speech recognition telephony applications using auditory stimuli?
Consistent with earlier findings (Meyer et al., 1996; Meyer et al., 1990; Zakay et
al., 1983), we hypothesized that more rapid rates of auditory tones would result in
greater overestimation of actual SRT durations. We used a ticking tone in this ex-
periment, which is the most common processing tone identified in previous litera-
ture (Albers, 1995, 1996; Albers & Bergman, 1995; Balentine & Morgan, 1999). We
also included a control condition of silence during the SRT, an “empty” time inter-
val, as in Zakay et al. (1983).
Effect of Auditory Waiting Cues 429
In addition, we explored the effect of processing tones on users’ negative affect.
Becausethereisnoempiricalevidencethat users prefertheticking tone but the litera-
ture indicates SRT results in negative affect, we wanted to determine if the ticking
rates affected user anxiety, stress, and impatience. We hypothesized that processing
tones with faster rates of change would increase users’ perceived negative affect.
Finally, consistent with previous findings of gender and age effects (Block et al.,
1998; Craik & Hay, 1999; A. Eisler & Eisler, 1994), we included both gender and age
as independent variables in this study.
2.1. Method
Participants.
Sixteen IBM employees volunteered to complete this study. The
participant sample included equal numbers of men and women, with equal numbers
of each gender group above and below the age of 40. All participants, except 1 man
and 1 women (each over 40 years), described themselves as experienced with speech
recognition telephony applications. All participants reported normal hearing.
Stimuli.
Participants heard three waiting tones and silence, counterbalanced
across participants to reduce order effects. The waiting tones consisted of a ticking
sound, edited so the rate of ticking doubled with each successive tone (a tick every
0.5, 0.25, and 0.125 msec, respectively). Each tone and silence played during actual
SRT durations of 3 sec, 8 sec, 13 sec, and 18 sec, creating 16 conditions of the inde-
pendent variables (four auditory stimuli and four SRT durations).
The SRT durations were consistent with the range of times used in the previous
literature (Galloway, 1981; Guynes, 1988; Kuhmann et al., 1987; Meyer et al., 1990).
In addition, Fraisse (1984) suggested that duration may influence depth of cogni-
tive processing. He suggested that people perceive durations of 100 msec to 5 sec as
being in the present, but involve memory for those over 5 sec in duration.
To simulate use of the tones in a speech recognition telephony application, the
auditory stimuli occurred between two spoken prompts. The initial prompt was a
statement announcing the computer’s initiation of processing (e.g., “Please hold
while we process your request”), and the second prompt indicated the end of the
system processing time (e.g., “Thank you for waiting”). Both prompts were spoken
by a woman and recorded (16 bit; 44,100 Hz) using Sound Forge 4.5d™ (Sonic
Foundry Inc.), then edited to include the auditory stimuli and SRT durations.
Procedure.
The study used a digram-balanced Latin square design to pre-
vent participants from hearing the same tones and SRT durations sequentially. This
scheme not only results in standard Latin square counterbalancing of order of ap-
pearance in rows and columns of the design, but also controls immediate sequen-
tial effects (Bradley, 1958; Lewis, 1993).
Each participant read a brief description of the task and four questions eliciting
time estimates and ratings of perceived anxiety, stress, and impatience on three bipo-
430 Polkosky and Lewis
lar7-pointratingscales.Each participant receivedverbalclarificationandadditional
explanation as needed. The study used a prospective paradigm (also used in previ-
ous research) in which participants knew that they would be estimating time inter-
vals but they were not permitted to use a watch or other timing device. They listened
to a prompt–auditory stimulus combination played over an Andrea CTI ANC-200™
(Andrea Electronics) handset attached to an IBM ThinkPad®(IBM Corp.) computer,
which simulated actual listening conditions in a telephony–speech recognition ap-
plication. Participants then completed the four questionnaire items: (a) “How long
was the waiting period (in seconds),” (b) “How anxious did you feel during the wait-
ing period,” (c) “How impatient did you feel during the waiting period,” and (d)
“How stressed did you feel during the waiting period?” Participants repeated this
procedure for the remaining audio stimuli and SRT durations.
2.2. Results and Discussion
Subjective time estimation.
A2×2×4×4mixed model analysis of variance
(ANOVA) with two within-subjects variables (auditory stimulus, SRT duration)
and two between-subject variables (gender, age) indicated a main effect of auditory
stimulus, F(3, 36) = 4.58, MSE = 24.42, p= .008; and SRT duration, F(3, 36) = 77.70,
MSE = 39.78, p< .0001. Two interactions, auditory stimulus–SRT duration, F(9, 108)
= 1.97, MSE = 6.85, p= .05; and stimulus–duration–age–gender, F(9, 108) = 2.51,
MSE = 6.85, p= .012, were also statistically significant. No other main effects and in-
teractions were significant (p> .17).
The effect of primary interest, that of auditory stimulus on subjective time esti-
mation, indicated that as the rate of ticking increased, participants’ estimates of the
mean actual SRT (10.5 sec) also increased (see Table 1). Participants underestimated
the mean actual SRT only in the silence condition. Post hoc ttests on the mean dif-
ference scores indicated significantly higher time estimates occurred for the 0.25
ticking condition compared to silence, t(15) = –2.98, p= .009; and the 0.125 ticking
condition compared with silence, t(15) = –3.17, p= .011. All other mean compari-
sons failed to be significant (p> .07, using the Bonferroni correction with α= 0.016).
This result confirmed our initial hypothesis that individuals would overestimate
SRT when they heard a rapid rate of ticking. However, the lack of significance
Effect of Auditory Waiting Cues 431
Table 1: Subjective SRT Estimates for Each Actual SRT Duration and
Auditory Stimulus (Experiment 1)
Auditory Stimulus
Mean Estimate
of Mean SRT
(Sec) aSD (Sec)
Actual
Length
of SRT
Mean Subjective
Estimate of SRT
(Sec) SD (Sec)
Silence 9.92 5.43 3 sec 3.75 1.78
0.5-sec ticking 11.45 9.64 8 sec 8.67 3.65
0.25-sec ticking 12.02 5.44 13 sec 14.20 6.49
0.125-sec ticking 13.09 8.89 18 sec 19.84 9.11
Note. SRT = system response time.
aThe mean actual SRT duration was 10.5 sec (mean of 3, 8, 13, and 18 sec).
among paired comparisons of the three ticking conditions suggests that the time es-
timates did not necessarily increase with the ticking rate but were overestimates
only as compared with the silence condition.
The interaction between SRT duration and auditory stimulus appears in Figure
1. As shown, the estimated duration increased slightly across the four auditory
stimulus conditions for 3, 8, 13, and 18 sec SRTs. However, post hoc ttests indicated
that participants significantly overestimated the 3-sec SRT when they heard
0.125-sec ticking as compared with silence, t(15) = –3.62, p= .003. Relative to the si-
lence condition, the overestimate of the 13-sec SRT with 0.125-sec ticking ap-
proached statistical significance with the Bonferroni correction, t(15) = –3.198, p=
.006. All other within-SRT duration comparisons failed to be significant (p> .01, us-
ing the Bonferroni correction with α= 0.004).
Additional effects of the independent variables, although of interest in the con-
text of previous literature, were not specified in our hypotheses. An expected effect
of SRT duration indicated that as the actual duration of the SRT increased, partici-
pants’ estimates of the duration also increased (see Table 1). Post hoc ttests on the
difference scores revealed all mean differences were highly significant (p< .00001).
Analysis of the time estimate data also indicated a significant four-way interac-
tion (time–rate–age–gender), shown in Figure 2. Women and men over 40 years old
similarly estimated SRTs, regardless of the auditory stimulus. However, men under
40 years underestimated the SRTs in all conditions, except when the actual duration
of the SRT was 3 sec. Although this interaction is of some theoretical significance
and provides additional support for gender and age effects in subjective time esti-
mation (Block et al., 1998; Craik & Hay, 1999; A. Eisler & Eisler, 1994), designers
must select a “best” interface for use in a single application. Because this effect has
little practical applicability for general design principles for a broad user popula-
tion, we did not analyze the interaction in more detail.
432 Polkosky and Lewis
FIGURE 1 Auditory stimulus—system response time (SRT) duration interaction.
In general, the main effect of auditory stimulus confirmed our initial hypothesis
that individuals overestimate SRT when they hear more rapid rates of ticking. Al-
though the slowest rate (0.5-sec ticking) was similar to silence, the two faster rates
did result in significant overestimates of SRT duration as compared to silence. The
data also suggest that some auditory stimuli rates may interact with the duration of
SRT, although these effects did not appear to be systematic.
Perceived negative affect.
A mixed model ANOVA demonstrated a main ef-
fect of SRT duration, F(3, 36) = 27.69, MSE = 3.02, p< .0001; auditory stimulus, F(3,
36) = 11.94, MSE = 9.55, p< .0001; and affective rating, F(2, 24) = 5.07, MSE = 2.21, p=
.015. A significant interaction occurred between SRT duration and affective rating,
F(6, 72) = 11.54, MSE = 0.41, p< .0001. No other main effects or interactions were sig-
nificant (p> .08).
The main effects indicated that participants’ negative affect (a combined rating
consisting of anxiety, stress, and impatience) increased as the SRT duration in-
creased and as the rate of auditory stimuli increased. Figure 3 shows higher neg-
ative affect ratings with increased rates of auditory stimuli. Post hoc ttests indi-
cated significantly higher ratings of negative affect among all paired conditions
(p< .01), except silence and 0.5-sec ticking. Therefore, 0.5-sec ticking resulted in
similar negative affect as silence during SRTs, but more rapid rates of ticking in-
creased participants’ perceived negative affect. This provides empirical support
for our initial hypothesis that, although SRT itself results in negative affect, the
auditory stimulus provided during the SRT can also produce or even increase
negative affect.
In general, Experiment 1 provided evidence that we did manipulate participant
time perception using auditory stimuli. Unfortunately, these manipulations did not
decrease user estimates of SRT durations or negative affectassociated with SRT. This
Effect of Auditory Waiting Cues 433
FIGURE 2 Auditory stimulus—actual system response time (SRT) interaction by gen-
der and age (A = 3 sec, B = 8 sec, C = 13 sec, D = 18 sec).
fascinatinginsight into the power of auditory interface design led us to questionhow
we might use this knowledge to promote more constructive user outcomes.
3. EXPERIMENT 2
In Experiment 2, we wanted to expand on our Experiment 1 results and increase the
power of the initial experiment through replication. Because faster rates of ticking
led to overestimates of SRT, we hypothesized that slower rates of ticking would re-
sult in underestimation of SRT. There is limited empirical support for this hypothe-
sis: Zakay et al. (1983) provided auditory stimuli using a buzzer that flickered at rates
of 0.5 sec or 2.0 sec during 14-sec verbal tasks. They found a significant main effect of
tempo rate on perceived duration of the buzzer tone, indicating that the slower rate
did produce lower subjective time estimates. A similar result occurred in other stud-
ies using rhythmic visual stimulation (Planas & Treurniet, 1988), auditory presenta-
tionofwordsatratesofoneevery6or3sec(Block, 1974), and rhythmic auditory stim-
ulation using a metronome (Jones & Natale, 1973). Therefore, the purpose of
Experiment 2 was to determine if even slower rates of ticking (slower than 0.5 sec per
tick) cause users to underestimate the true SRT. We again wanted to determine
whether the rate of ticking also influenced listeners’ perceived negative affect and
whether gender and age effects occurred. Such a finding would be important be-
cause the primary goal of this line of research is to discover techniques for reducing
the subjective duration of SRT.
3.1. Method
This experiment used the same design, procedure, and participant characteristics
as the previous study. However, the auditory stimuli included three rates of ticking
434 Polkosky and Lewis
FIGURE 3 Perceived negative affect ratings by auditory stimuli.
in which the rate was halved for successive tones (a tick every 0.5, 1, and 2 sec, re-
spectively) and a control condition of silence. Including the silence and 0.5-sec tick-
ing conditions in the second experiment provided an opportunity to partially repli-
cate the first study and determine if the two experiments yielded comparable
results. All other methodological details were identical to Experiment 1.
3.2. Results and Discussion
Subjective time estimation.
We analyzed the data using a mixed model
ANOVA, as in Experiment 1. The ANOVAindicated a main effect of auditory stim-
ulus, F(3, 36) = 4.02, MSE = 113.80, p= .015);and the expected main effect of SRT du-
ration, F(3, 36) = 210.39, MSE = 1878.24, p< .0001. Two significant interactions with
SRT duration also occurred: duration–gender, F(3, 36) = 4.53, MSE = 40.42, p= .009;
and duration–age–gender, F(3, 36) = 3.11, MSE = 27.77, p= .038. The main effect of
gender, F(1, 12) = 3.86, MSE = 344.73, p= .073); and interaction between age and
gender, F(1, 12) = 4.18, MSE = 372.80, p= .064, were marginally significant. All other
main effects and interactions were not significant (p> .51).
Table 2 presents the main effect of auditory stimulus. Participants most accu-
rately estimated the mean actual SRT when they heard the 0.5-sec ticking but un-
derestimated in the other three conditions. Post hoc ttests indicated statistically
significant differences between the following mean pairs: silence–0.5 second tick-
ing, t(15) = –3.24, p= .005; and 0.5-sec ticking–2-sec ticking, t(15) = 0.62, p= .015.
Other mean comparisons were not significant (p> .09, using the Bonferroni correc-
tion with α= 0.016). This result, a stronger result than observed in the previous
study (due to the lack of interaction), demonstrates more significant estimate dif-
ferences among the ticking tones, as well as in comparisons between the ticking
and silence conditions. Our hypothesis that slower rates of ticking would result in
underestimates of time duration was confirmed.
The remaining effects are interesting in the context of the previous literature, al-
though their discovery did not drive this study. The main effect of SRT duration ap-
pears in Table 2. As shown, participants estimated longer SRTs when the actual du-
ration was longer, with the shortest mean estimate for the 3-sec SRT and longest
mean estimate for the 18-sec SRT. Participants showed increasing variability in
their time estimates as the SRT became longer. Post hoc ttests on the difference
Effect of Auditory Waiting Cues 435
Table 2: Subjective SRT Estimates for Each Actual SRT Duration and
Auditory Stimulus (Experiment 2)
Auditory
Stimulus
Mean Estimate
of Mean Actual
SRT (Sec) aSD (Sec)
Actual
Length
of SRT
Mean Subjective
Estimate of SRT
(Sec) SD (Sec)
Silence 8.89 5.63 3 sec 3.01 1.46
0.5-sec ticking 10.42 6.12 8 sec 7.42 2.88
1-sec ticking 9.52 6.08 13 sec 11.45 4.36
2-sec ticking 8.72 7.58 18 sec 15.66 5.49
Note. SRT = system response time.
aThe mean actual SRT duration was 10.5 seconds (mean of 3, 8, 13, and 18 seconds).
scores again indicated statistically significant differences among estimates for all
four SRT durations (p< .000001).
Analysis of the time estimate data also indicated two significant interactions
with SRT duration. The duration–age–gender interactions are shown in Figure 4. In
general, women under 40 years of age provided the longest estimate of each SRT
compared to the other three participant groups. Men under 40 years of age esti-
mated the shortest SRT.
Negative affect.
A mixed model ANOVA demonstrated a main effect of audi-
tory stimulus, F(3, 36) = 4.11, MSE = 6.56, p= .013; SRT duration, F(3, 36) = 21.39,
MSE = 3.85, p< .0001; and affective rating, F(2, 24) = 17.60, MSE = 0.73, p< .0001.
Significant interactions occurred between auditory stimulus and affective rating,
F(6, 72) = 2.31, MSE = 0.18, p= .042; auditory stimulus and SRT duration, F(9, 108) =
2.02, MSE = 1.39, p= .044; and SRT duration and affective rating, F(6, 72) = 8.72,
MSE = 0.23, p< .0001; as well as a three-way stimulus–rating–age–interaction, F(6,
72) = 2.72, MSE = 0.18, p= .019. The four-way interaction of duration–af-
fect–age–gender was marginally significant, F(6, 72) = 2.08, MSE = 0.23, p= .066. All
other main effects and interactions were not statistically significant (p> .095).
The effects of interest indicated that participants’ negative affect (a combined rat-
ing consisting of anxiety, stress, and impatience) decreased as the rate of ticking de-
creased. Figure 5 shows that participants rated higher negative affect when they
heard the fastest ticking rate (0.5-sec ticking) and rated lower negative affect with
slower ticking rates (1- and 2-sec ticking). However, their negative affect increased
slightly as the ticking rate became very slow (2-sec ticking). Post hoc ttests indicated
marginally significant differences between the silence and ticking conditions: si-
436 Polkosky and Lewis
FIGURE 4 System response time (SRT) duration estimates by gender and age group.
lence–0.5-sec ticking, t(15) = –2.83, p= .013; and silence–2-sec ticking, t(15) = –2.69, p=
.017; but other mean comparisons were not significant (p> .04, using the Bonferroni
correction with α= 0.01). In general, 1-sec ticking was rated as similar to silence, but
both 0.5- and 2-sec ticking resulted in increased perception of negative affect.
Figure 6 illustrates the interaction between auditory stimulus and affective rat-
ing. Perceived anxiety, stress, and impatience all decreased as the ticking rate be-
came slower but were slightly higher with the slowest ticking rate (2-sec rate). Ac-
cordingly, post hoc ttests indicated marginally significant mean differences
between the perceived stress during silence and 0.5-sec ticking, t(15) = –3.28, p=
.005; and between the perceived impatience during silence and 2-sec ticking, t(15) =
–3.13, p= .007. All other within-affective category mean differences were not signif-
icant (p> .01, using the Bonferroni correction with α= 0.004).
3.3. Comparison of Experiments 1 and 2
A mixed model ANOVA revealed a nonsignificant main effect of study on subjec-
tive time estimates, F(1, 24) = 0.94, MSE = 72.07, p= .343; this indicates that partici-
pants in both studies provided similar estimates of the silence and 0.5-sec ticking
rate conditions. Significant main effects occurred for gender, F(1, 24) = 4.39, MSE =
72.07, p= .047; auditory stimulus, F(1, 24) = 9.27, MSE = 16.17, p= .006; and SRT du-
ration, F(3, 72) = 216.38, MSE = 10.06, p< .0001. Marginally significant interactions
occurred for duration–gender, F(3, 72) = 2.41, MSE = 10.06, p= .074; and auditory
stimulus–SRT duration–study, F(3, 72) = 2.51, MSE = 5.83, p= .066. In general, these
results indicated that participants estimated the silence and 0.5 conditions simi-
larly in both studies.
Effect of Auditory Waiting Cues 437
FIGURE 5 Main effect of auditory stimulus.
In addition to the analysis of the raw data in both studies, we also analyzed dif-
ference scores (actual–perceived) and ratio (actual–perceived) data in both studies.
The analyses were virtually identical and are therefore not reported in detail. How-
ever, the ratio data help to illuminate how time estimates varied overall.
As shown in Figure 7, participants underestimated SRT durations in the slow
ticking rate conditions (1- and 2-sec ticking) and silence but overestimated SRT
with the more rapid rates (0.5-, 0.25-, 0.125-sec ticking). In the figure, a ratio less
than 1.00 indicates an underestimate of the actual SRT, and a ratio greater than 1.00
indicates an overestimate of the actual SRT. Because the main effect of study was
nonsignificant, Figure 7 presents the means of the 0.5-sec ticking and silence condi-
tions calculated from both experiments (32 participants).
We also completed a final analysis on the ratio estimates. As noted in the previ-
ous analyses, significant differences occurred between 0.25-sec ticking and silence,
0.125-sec ticking and silence, 0.5-sec ticking and silence, and 0.5-sec ticking and
2-sec ticking conditions. A set of four independent ttests among the unreplicated
ticking rate conditions (using the Bonferroni correction with α= 0.01) indicated
that time estimates with the slow ticking rates were significantly shorter than time
estimates with the fast ticking rates:
1-sec ticking SRT estimate less than 0.25-sec ticking SRT estimate, t(126) = 3.31,
p= .001.
1-sec ticking SRT estimate less than 0.125-sec ticking SRT estimate, t(126) =
3.78, p< .0001.
2-sec ticking SRT estimate less than 0.25-sec ticking SRT estimate, t(126) = 4.51,
p< .0001.
2-sec ticking SRT estimate less than 0.125-sec ticking SRT estimate, t(126) =
4.65, p< .0001.
438 Polkosky and Lewis
FIGURE 6 Interaction between auditory stimulus and affective rating.
In summary, this final analysis sharpens our findings in Experiments 1 and 2:
Not only did participants estimate longer SRTs with the fast rates than with silent
SRTs, they also estimated longer SRT durations as the ticking rate increased.
4. GENERAL DISCUSSION
In two experiments, we investigated the effect of SRT duration and rate of change
of a system processing tone (ticking rate) on participants’ subjective time estimates
and perceived affect. In general, the results indicated that participants’ SRT esti-
mates increased with ticking rate, and they overestimated SRTs with fast ticking
rates relative to silent SRTs. As the ticking rate increased, participants’ ratings of
perceived anxiety, stress, and impatience also increased. However, with a slow rate
of ticking (2-sec ticking), participants underestimated the duration of SRT but indi-
cated a significant increase in negative affect as compared with silent SRTs. The re-
sults confirmed our initial hypotheses that more rapid processing tones would in-
crease both subjective SRT estimates and perceived negative affect. Our hypotheses
that slower processing tones would result in underestimates of SRT was also con-
firmed for the 2-sec ticking rate.
4.1. Critical Evaluation
These experiments improve on previous research because they are the first SRT
studies that investigate the effect of an auditory stimulus, as opposed to a visual
Effect of Auditory Waiting Cues 439
FIGURE 7 Ratio of perceived–actualsystemresponsetime (SRT;Experiments 1 and 2).
display, on user ’s perception of SRT duration. These studies are also the first to ad-
dress the unique design environment of speech recognition telephony applica-
tions. Finally, because time estimation was a within-subjects variable, we can draw
conclusions about individuals’ estimates under several auditory stimulus condi-
tions: This sensitive, economical, and powerful design has had limited use in the
previous literature.
There are also potential limitations in the design of the two current experiments.
As Meyer et al. (1996) pointed out, a display that minimizes an apparent duration
maynot be the one that users prefer.Inourstudies,wedidnotdeterminewhetherus-
ers prefer the ticking tone. Preliminary investigation of this issue (Polkosky, 2001)
suggests that users prefer jazz music and silence to a ticking tone (0.5-sec ticking)
during system processing. Previous studies of music and waiting periods have dem-
onstratedthatmusiccan influence emotional states during a waiting period (Chebat,
Gelinas-Chebat, & Filiatrault, 1993; Hui, Dube, & Chebat, 1997), and individuals
wait longer when they hear music instead of silence (Hargreaves, 1999). Ramos
(1993) found that the type of music is an important design consideration, with jazz
music producing the fewest lost calls to a state abuse hotline (followed by country,
classical, popular,and relaxation music causing progressively more lost calls). How-
ever, the effect of music on time perception is not clear in this research. North,
Hargreaves, and McKendrick (1999) found that two music conditions resulted in
similar waiting time estimates as a “please hold” message spoken at 10-sec intervals.
Chebat et al. demonstrated that musical tempo (fast vs. slow) did not have a direct in-
fluence on time perception while customers waited in bank lines. Instead, musical
tempo had a complex, moderating relation to mood and attention, which in turn in-
fluencedtimeperception;thetemporatealsointeractedwiththeamountof visual in-
formation, an interaction reminiscent of Glicksohn (1992). These studies provide
some preliminary empirical support for music during SRT, but the results may not
generalize the specific context of a telephony application. Continued research is
needed to further investigate the effects of type and tempo of music on user prefer-
ences and time estimation in telephony applications.
Another possible limitation of our work centers on the generalizability of re-
sults. The participant group included volunteers who were all employees of IBM.
However, there is little reason to expect that IBM employees would have unique
perceptual abilities as compared to a broader population of adults. Indeed, a study
of IBM employees and individuals hired from a temporary agency demonstrated
no difference between these participant groups in their preferences for auditory
tones (Polkosky & Lewis, 2001). Therefore, these results, especially when combined
with previous empirical evidence with a variety of participant groups, should gen-
eralize to a broad population of male and female adults.
4.2. Design Implications
Prior to our studies, the most notable style guide for telephony applications offered
a single “good practice” guideline related to user waiting times: “Provide auditory
cues for wait times of a few seconds or more” (Balentine & Morgan, 1999, p. 139).
Balentine and Morgan also advocated “low-level ticking or [a] similar sound” (p.
440 Polkosky and Lewis
141) as appropriate auditory cues. They mention music as a potential waiting cue
only to caution designers against it: “There are cases in which this [music or a spo-
ken message] is not possible or desirable” (p. 140). Our results provide several
more specific guidelines for telephony interfaces, expanding on the previous
guideline and confirming that designers must consider SRT to create truly
user-centered telephony designs.
Guideline 1.
Do include a slow-rate waiting cue in telephony applications.
Balentine and Morgan (1999) cautioned that their auditory waiting cue guideline is
an item that “appear[s] to be common sense but [is] not supported by any evidence,
or which seem[s] to work in practice but [is] easily missed by designers” (p. 26).
Our studies begin to provide empirical evidence that support this guideline as a re-
quirement of telephony interfaces. In our studies, 0.5-, 1-, and 2-sec ticking cues
have advantages for the user. These tones confirm that the user is connected and
the system is working. The 1- and 2-sec ticking cues have two additional benefits:
They caused participants to underestimate the actual time they were waiting in our
simulated applications and decreased the negative affect associated with waiting
as compared to faster ticking rates. Our evidence indicates that designers should
include waiting cues during SRTs in telephony applications.
Guideline 2.
Limit the duration and variability of SRTs as much as possible.
We found that even 3-sec SRTs result in user anxiety, stress, and impatience in simu-
lated telephony applications. Our results are consistent with the results of previous
studies that have shown that SRT itself can negatively impact users’ emotional and
physiological states (Guynes, 1988; Komatsubhara, Yokomizo, Yamamoto, & Noro,
1985; Kuhmann et al., 1987; Schleifer & Amick, 1989), and that anxiety is associated
with SRTs (A. Eisler & Eisler, 1994; Guynes, 1988). This guideline is especially im-
portant in telephony applications because the cognitive load on users is greater
than in desktop or other highly visual applications. Users must hold spoken com-
mands and the structure of the interface itself in working memory while complet-
ing their primary task, process synthetic speech, and appropriately manage infor-
mation exchange with the technology, all unique and unfamiliar aspects of
telephony applications that are likely to increase user anxiety for these interfaces.
In addition, it is these sophisticated abilities of telephony interfaces (speech recog-
nition, information retrieval, and synthetic speech), coupled with the familiarity of
a telephone receiver, that are also likely to make users less tolerant of SRT.
Guideline 3.
Use a ticking tone with a 1- or 2-sec rate, which provides the best
combination of shortened SRT perception with least negative emotional effects.
Based on these findings, we recommend that if a speech recognition–telephony in-
terface uses a ticking tone during system processing (as suggested by Balentine &
Morgan, 1999), a slow rate of ticking is better than a fast ticking rate. A 2-sec ticking
rate resulted in underestimation of SRT. A 1-sec ticking rate resulted in similar affect
as silence but did not have a perceptual advantage of shortened SRT duration. An in-
termediate ticking rate (approximately 1.5 sec) may combine perceptual advantages
Effect of Auditory Waiting Cues 441
with lower negative affect; however, this observation is speculative and should be
evaluated by interface designers prior to its use in a particular application.
Guideline 4.
Avoid ticking tone rates of greater than 0.5 sec. The most unam-
biguous finding in these studies is that fast ticking rates had perceptual disadvan-
tages, causing participants to overestimate their waiting time, as well as increase
their negative affect. Rapid ticking should be rejected as a system processing tone.
For the designer, any auditory cue used during SRT should have a relatively slow
rateofchangeortempotominimizeuseroverestimationof SRT and negativeaffect.
Guideline 5.
Select a waiting tone based on evaluation of proposed tones
with the targeted user population. We found statistically significant interactions
based on participants’ age and gender. North et al. (1999) found that music that
callers liked and fit their expectations positively influenced the waiting period in a
telephone survey. These results confirm that design teams must carefully identify
their user population and test their application with participants from the target
population to ensure they provide optimal, user-centered feedback.
Guideline 6.
Evaluate the priority of limiting SRT duration based on the tar-
geted user population. Our studies suggest that user groups over 40 years of age
(both men and women) may be less tolerant of long and variable SRTs because they
provide longer estimates of time durations. These groups may especially benefit
from slow tempo music or ticking to reduce their anxiety and make their waiting
period seem shorter. Conversely, men under 40 years of age may be more tolerant
of SRTs, even relatively long delays, because they consistently underestimate their
waiting time.
Guideline 7.
Evaluateproposedwaitingcues with both long and short SRT du-
rations. Interface designers should evaluate any proposed auditory processing tone
with a variety of SRT durations. The finding of interaction effects in Experiment 1
suggests that a specific tone may have different effects with short and long SRTs.
In terms of application design, our studies more completely specify the range
and type of waiting cues that should be included in telephony applications. They
provide the weight of empirical data to extend and clarify guidelines for this
emerging, highly constrained, and unique design environment. The studies also
unmistakably highlight the need for a clearly defined user population for a particu-
lar application, as well as the importance of user-centered design and evaluation
throughout the development process.
At a broader level, these studies justify the need for thoughtful interface design,
informed and enhanced through its contact with documented cognitive-psycho-
logical phenomena. Our first study indicated that uninformed use of a ticking tone,
an aspect of the telephony interface that designers may easily disregard, can have
very negative consequences. Conversely, the second study demonstrated that the
application of empirical work in the cognitive sciences to interface design can re-
442 Polkosky and Lewis
duce these limitations. Therefore, these studies are consistent with the goal of cog-
nitive engineering identified by various researchers (Falson, 1990;
Gerhardt-Powals, 1996; Hollnagel & Woods, 1983): to identify guidelines for inter-
face design based on human information processing abilities. As Shneiderman
(1987) asserted more than 1 decade ago, this approach is not “the paint put on the
end of a project” but “the steel frame on which the structure is built” (p. v).
REFERENCES
Albers, M. (1995). The Varese system, hybrid auditory interfaces, and satellite–ground con-
trol: Using auditory icons and sonification in a complex, supervisory control system. In
Proceedings of ICAD’94 The Second International Conference on Auditory Display (pp. 3–13).
Santa Fe, NM: Santa Fe Institute.
Albers, M. (1996). Auditory cues for browsing, surfing, and navigating the WWW: The audi-
ble Web. In Proceedings of ICAD’96 The Third International Conference on Auditory Display
(pp. 85–90). Palo Alto, CA: International Conference on Auditory Display.
Albers, M., & Bergman, E. (1995). The audible Web: Auditory enhancements for Mosaic. In
Proceedings of CHI’95: The ACM Conference on Human Factors in Computing Systems (pp.
318–319). Denver, CO: ACM.
Balentine, B., & Morgan, D. (1999). How to build a speech recognition application: A style guide for
telephony dialogues. San Ramon, CA: Enterprise Integration Group.
Barber, R., & Lucas, H. (1983). System response time, operator productivity, and job satisfac-
tion. Communications of the ACM, 26, 972–976.
Beaudouin-Lafon, M., & Conversy, S. (1996). Auditory illusions for audio feedback. In Pro-
ceedings of CHI’96 Conference on Human Factors in Computing Systems (pp. 299–300). New
York: ACM.
Block, R. (1974). Memory and the experience of duration in retrospect. Memory and Cognition,
2, 153–160.
Block, R. (1990). Cognitive models of psychological time. Hillsdale, NJ: Lawrence Erlbaum Asso-
ciates, Inc.
Block, R., Zakay, D., & Hancock, P. (1998). Human aging and duration judgments: A
meta-analytic review. Psychology and Aging, 13, 584–596.
Bradley, J. V. (1958). Complete counterbalancing of immediate sequential effects in a Latin
square design. Journal of the American Statistical Association, 53, 525–528.
Butler, T. (1984). Computer response time and user performance during data entry. AT&T
Bell Laboratories Technical Journal, 63, 1007–1018.
Buxton, W. (1989). Introduction to this special issue on nonspeech audio. International Journal
of HumanComputer Interaction, 4, 1–9.
Carrasco, M., Guillem, M., & Redolat, R. (2000). Estimation of short temporal intervals in
Alzheimer’s disease. Experimental Aging Research, 26, 139–151.
Chebat, J., Gelinas-Chebat, C., & Filiatrault, P. (1993). Interactive effects of musical and vi-
sual cues on time perception: An application to waiting lines in banks. Perceptual and Mo-
tor Skills, 77, 995–1020.
Craik, F., & Hay, J. (1999). Aging and judgments of duration: Effects of task complexity and
method of estimation. Perception and Psychophysics, 61, 549–560.
Dannenbring, G. (1983). The effect of computer response time on user performance and satisfac-
tion: A preliminary investigation. Behavior Research Methods and Instrumentation, 15, 213–216.
Eisler, A., & Eisler, H. (1994). Subjective time scaling: Influence of age, gender, and type A
and type B behavior. Chronobiologia, 21, 185–200.
Effect of Auditory Waiting Cues 443
Eisler, H. (1976). Experiments on subjective duration 1868–1975: Acollection of power func-
tions exponents. Psychological Bulletin, 83, 1154–1171.
Ezrati-Vinacour, R., & Levin, I. (2001). Time estimation by adults who stutter. Journal of
Speech Language and Hearing Research, 44, 144–155.
Falson, P. (1990). Cognitive ergonomics: Understanding, learning, and designing human computer
interaction. New York: Academic.
Fink, A., & Neubauer, A. (2001). Speech of information processing, psychometric intelli-
gence and time estimation as an index of cognitive load. Personality and Individual Differ-
ences, 30, 1009–1021.
Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35, 1–36.
Galloway, G. (1981). Response times to user activities in interactive man/machine computer
systems. In Proceedings of the Human Factors Society 25th Annual Meeting (pp. 754–758).
Santa Monica, CA: Human Factors and Ergonomics Society.
Gaver, W. (1997). Auditory interfaces. In M. Helander, T. Landauer, & P. Prabhu (Eds.), Hand-
book of humancomputer interaction (2nd ed., pp. 1003–1041). Amsterdam: Elsevier.
Gerhardt-Powals, J. (1996). Cognitive engineering principles for enhancing human–com-
puter performance. International Journal of HumanComputer Interaction, 8, 189–211.
Glicksohn, J. (1992). Subjective time estimation in altered sensory environments. Environ-
ment and Behavior, 24, 634–652.
Goodman, T., & Spence, R. (1982). The effects of potentiometer dimensionality, system re-
sponse time, and time of day on interactive graphical problem solving. Human Factors, 24,
437–456.
Guynes, J. (1988). Impact of system response time on state anxiety. Communications of the
ACM, 31, 342–347.
Hargreaves, D. (1999). Can music move people? The effects of musical complexity and si-
lence on waiting time. Environment and Behavior, 31, 136–149.
Hellstrom, A., & Almkvist, O. (1997). Tone duration discrimination in demented, mem-
ory-impaired, and healthy elderly. Dementia and Geriatric Cognitive Disorders, 8, 49–54.
Hellstrom, A., Lang, H., Portin, R., & Rinne, J. (1997). Tone duration discrimination in Par-
kinson’s disease. Neuropsychologica, 35, 737–740.
Hollnagel, E., & Woods, D. (1983). Cognitive systems engineering: New wine in new bottles.
International Journal of Man–Machine Studies, 18, 583–600.
Hui, M., Dube, L., & Chebat, J. (1997). The impact of music on consumers’ reactions to wait-
ing for services. Journal of Retailing, 73, 87–104.
Jacko, J., Sears, A., & Borella, M. (2000). The effect of network delay and media on user per-
ceptions of Web resources. Behavior and Information Technology, 19, 427–439.
Johansson, G., & Aronsson, G. (1984). Stress reactions in computerized administrative work.
Journal of Occupational Behavior, 5, 159–181.
Jones, E., & Natale, T. (1973). Information processing theory of time estimation. Perceptual
and Motor Skills, 36, 226.
Kamm, C., & Helander, M. (1997). Design issues for interfaces using voice input. In M.
Helander, T. Landauer, & P. Prabhu (Eds.), Handbook of human–computer interaction (2nd
ed., pp. 1043–1059). Amsterdam: Elsevier.
Kohlisch, O., & Kuhmann, W. (1997). System response time and readiness for task execution:
The optimum duration of inter-task delays. Ergonomics, 40, 265–280.
Komatsubhara, A., Yokomizo, S., Yamamoto, S., & Noro, K. (1985). Mental strain in a VDT task im-
posed by computer system response time. In Proceedings of the 9th International Ergonomics Asso-
ciation Meeting (pp. 316–318). Bournemouth, England: International Ergonomics Association.
Kuhmann, W., Boucsein, W., Schaefer, F., & Alexander, J. (1987). Experimental investigation
of psychophysiological stress-reactions induced by different system response times in
human–computer interactions. Ergonomics, 30, 933–943.
444 Polkosky and Lewis
Lange, K., Tucha, O., Steup, A., Gsell, W., & Naumann, M. (1995). Subjective time estimation
in Parkinson’s disease. Journal of Neural Transmision-Supplement, 46, 433–438.
Levin, I., & Zakay, D. (Eds.). (1989). Time and human cognition. Amsterdam: Elsevier.
Lewis, J. R. (1993). Pairs of Latin squares that produce digram-balanced Greco-Latin designs:
A BASIC program. Behavior Research Methods, Instruments, & Computers, 25, 414–415.
Martin, G., & Corl, K. (1986). System response time effects on user productivity. Behavior and
Information Technology, 5, 3–13.
Meyer, J., Bitan, Y., & Shinar, D. (1995). Display a boundary in graphic and symbolic “wait”
displays: Duration estimates and users preference. International Journal of Human–Com-
puter Interaction, 7, 273–290.
Meyer, J., Shinar, D., Bitan, Y., & Leiser, D. (1996). Duration estimates and users’ preferences
in human–computer interaction. Ergonomics, 39, 46–60.
Meyer, J., Shinar, D., & Leiser, D. (1990). Time estimation of computer “wait” message dis-
plays. In Proceedings of the Human Factors Society 34th Annual Meeting (pp. 360–364). Santa
Monica, CA: Human Factors and Ergonomics Society.
Miller, L., & Thomas, J. (1977). Behavioral issues in the use of interactive systems. Interna-
tional Journal of Man–Machine Studies, 9, 509–536.
Moore, B. (1989). An introduction to the psychology of hearing (3rd ed.). London: Aca-
demic.
Murray, R., & Abrahamson, D. (1983). The effect of system response time delay variability on
inexperienced videotext users. Behavior and Information Technology, 2, 237–251.
Nichelli, P., Venneri, A., Molinari, M., Tavani, F., & Grafman, J. (1993). Precision and accuracy
of subjective time estimation in different memory disorders. Cognitive Brain Research, 1,
87–93.
Nickerson, R. (1969). Man–computer interaction: A challenge for human factors research.
IEEE Transactions on Man–Machine Systems, 10(4), 164–180.
North, A., Hargreaves, D., & McKendrick, J. (1999). Music and on-hold waiting time. British
Journal of Psychology, 90, 161–164.
Ornstein, R. (1969). On the experience of time. New York: Penguin.
Planas, M., & Treurniet, W. (1988). The effects of feedback during delays in simulated teletext
reception. Behavior and Information Technology, 7, 183–191.
Polkosky, M. (2001). User preference for system processing tones (Tech. Rep. No. 29.3436). Ra-
leigh, NC: IBM.
Polkosky, M., & Lewis, J. (2001). User preference for turntaking tones 2: Participant source issues
and additional data (Tech. Rep. No. 29.3447). Raleigh, NC: IBM.
Poynter, W., & Homa, D. (1983). Duration judgment and the experience of change. Perception
and Psychophysics, 33, 548–560.
Rammsayer, T., & Rammstedt, B. (2000). Sex-related differences in time estimation: The role
of personality. Personality and Individual Differences, 29, 301–312.
Ramos, L. (1993). The effects of on-hold telephone music on the number of premature disconnec-
tions to a statewide protective services abuse hot line. Journal of Music Therapy, 30, 119–129.
Ramsay, J., Barbesi, A., & Preece, J. (1998). A psychological investigation of long retrieval
times on the World Wide Web. Interacting With Computers, 10, 77–86.
Rauterberg, M. (1998). About the importance of auditory alarms during the operation of a
plant simulator. Interacting With Computers, 10, 31–44.
Riesen, J., & Schnider, A. (2001). Time estimation in Parkinson’s disease: Normal long duration
estimation despite impaired short duration discrimination. Journal of Neurology, 248, 27–35.
Roast, C. (1998). Designing for delay in interactive information retrieval. Interacting With
Computers, 10, 87–104.
Schaefer, F. (1990). The effect of system response times on temporal predictability of work
flow in human–computer interaction. Human Performance, 3, 173–186.
Effect of Auditory Waiting Cues 445
Schleifer, L., & Amick, B. (1989). System response time and method of pay: Stress effects in
computer based tasks. International Journal of Human–Computer Interaction, 1, 23–39.
Schumacher, R., Hardzinski, M., & Schwartz, A. (1995). Increasing the usability of interactive
voice response systems: Research and guidelines for phone-based interfaces. Human Fac-
tors, 37, 251–264.
Shneiderman, B. (1984). Response time and display rate in human performance with com-
puters. ACM Computer Surveys, 16, 265–285.
Shneiderman, B. (1987). Designing the user interface: Strategies for effective human–computer in-
teraction. Cambridge, MA: Winthrop.
Spiegal, M., & Streeter, L. (1997). Applying speech synthesis to user interfaces. In M.
Helander, T. Landauer, & P. Prabhu (Eds.), Handbook of human–computer interaction (2nd
ed., pp. 1061–1084). Amsterdam: Elsevier.
Thadhani, A. (1981). Interactive user productivity. IBM Systems Journal, 20, 407–423.
Thum, M., Boucsein, W., Kuhmann, W., & Ray, W. (1995). Standardized task strain and sys-
tem response times in human–computer interaction. Ergonomics, 38, 1342–1351.
Watson, B., Walker, N., Ribarsky, W., & Spaulding, V. (1998). Effects of variation in system re-
sponsiveness on user performance in virtual environments. Human Factors, 40, 403–414.
Weiss, S., Boggs, G., Lehto, M., Shodja, S., & Martin, D. (1982). Computer system response
time and psychophysiological stress. In Proceedings of the 26th Annual Meeting of the Hu-
man Factors Society (pp. 698–702). Santa Monica, CA: Human Factors and Ergonomics
Society.
Yoblick, D., & Salvendy, G. (1970). Influence of frequency on the estimation of time for audi-
tory, visual, and tactile modalities: The kappa effect. Journal of Experimental Psychology, 86,
157–164.
Zakay, D. (1989). Subjective time and attentional resource allocation: An integrated model of
time estimation. In I. Levin & D. Zakay (Eds.), Time and human cognition: A life-span per-
spective (pp. 365–397). Amsterdam: Elsevier.
Zakay, D. (1993). Time estimation methods—Do they influence prospective duration esti-
mates? Perception, 22, 91–101.
Zakay, D., Lomranz, J., & Kaziniz, M. (1984). Extraversion–introversion and time perception.
Personality and Individual Differences, 5, 237–239.
Zakay, D., Nitzan, D., & Glicksohn, J. (1983). The influence of task difficulty and external
tempo on subjective time estimation. Perception & Psychophysics, 34, 451–456.
446 Polkosky and Lewis
... For instance, excessively short response time can cause users to feel pressured and rushed during voice interaction (Wang et al., 2023). Conversely, overly long response times may lead to user distraction or confusion (Wang et al., 2023), thereby evoking negative emotions like anxiety, stress, and impatience (Polkosky & Lewis, 2002), which can negatively impact the user's evaluation of the product and their experience (Bhatti et al., 2000), culminating in discomfort and dissatisfaction. ...
... As a result, the intelligent voice assistant was perceived as flawed and robotic, diminishing its anthropomorphism and adversely affecting its quality evaluation, consequently resulting in decreased satisfaction (Blut, et al., 2021). Furthermore, overly extended response time increases negative emotions such as boredom and anxiety among older adults (Polkosky & Lewis, 2002), thus diminishing the comfort level of older adults. ...
Article
With the maturity and development of digital technology, voice interaction has been widely used in different devices or scenarios. However, older adults have low satisfaction and acceptance levels when using voice interaction. To improve older adults’ user experience of voice interaction, the current study explored the effect of response time and feedback type on the user experience of voice interaction among older adults. A Xiaodu smart speaker was used as the experimental device, and the response time was recorded using artificial breakpoints. Our study consisted of two experiments that explored the voice wake-up and voice dialogue stages under different combinations of response time and feedback type and older adults rated their satisfaction, comfort, anthropomorphism, and perceived speed level. The results showed that: (1) the optimal wake-up response time for older adults is 250–1000 ms, and the optimal dialogue response time is 600–1500 ms. (2) As response time increases, older adults’ satisfaction, comfort, and anthropomorphism toward smart speakers decrease. In the wake-up stage, the scores in the time period >2000 ms is significantly lower than those shorter response times. In the dialogue stage, the scores in the time periods >3000 ms are significantly lower than those shorter response times. (3) In the wake-up stage, the voice-based prompt “Hello, I’m Xiaodu” scores highest among all kinds of feedback types. In the dialogue stage, the social-oriented feedback scores higher in satisfaction and comfort than task-oriented feedback. The results of the current study could provide implications for age-friendly voice interaction design.
... A validated measure of this concept for human-computer speech interactions has recently been introduced, assessing people's representations machines' conversational capabilities: the Partner Modelling Questionnaire (PMQ) (Doyle, 2022). Whereas prior work in this area relied on making inferences about people's impressions of a machine dialogue partner from their ratings of usability (Hone & Graham, 2001) or satisfaction (Polkosky & Lewis, 2002), the PMQ explicitly measures the partner model construct, giving insights into how the design of an agent impacts users' understanding of its capabilities. The PMQ builds upon prior partner modelling research which identified a three-factor structure to people's formation of partner models of machines, namely: perceptions of partner competence and dependability, assessment of human-likeness, and perceptions of the conversational flexibility of the system (Doyle et al., 2021), enabling researchers to understand the dimensions through which study participants form their partner models of machines. ...
Article
Full-text available
Socially shared regulation of learning (SSRL) is a crucial process for groups of learners to successfully collaborate. Detecting and supporting SSRL is a challenge, especially in real time, but hybrid intelligence approaches such as Artificial Intelligence (AI) agents may make this possible. Leveraging the concept of trigger events which invite SSRL, we present a design of an AI agent, MAI, which can detect SSRL and prompt students to raise their group‐level metacognitive awareness with the aim of facilitating SSRL. We present the methodology we used to design MAI, drawing on the Echeloned DSR (eDSR) Methodological Framework and making use of the Wizard of Oz prototyping paradigm. We likewise present empirical results evaluating our initial prototype of MAI, using lexical alignment between speakers as a quantitative measure of the effect of MAI's prompts on facilitating SSRL, the Partner Model Questionnaire as a quantitative measure of perceptions of MAI, and interviews as qualitative context for these perceptions. We found that the first prototype of MAI did not facilitate SSRL as hoped, possibly owing to mixed perceptions of MAI's reliability and lack of clarity about MAI's role in the collaborative learning task. From these findings, we offer revised prompts for the next iteration of prototyping this agent and a refined set of design requirements for future development of metacognitive AI agents for supporting SSRL. Practitioner notes What is already known about this topic Socially Shared Regulation of Learning (SSRL) is recognized as a critical component for the success of collaborative learning, emphasizing the importance of group‐level regulatory processes in achieving shared goals, enacting strategies and monitoring learning progress. Supporting SSRL in face‐to‐face collaborative learning environments presents challenges, including the complexity of coordinating and synchronizing individual contributions and regulatory actions within a group context. What this paper adds This paper introduces the design of Metacognitive Artificial Intelligence (MAI), a novel AI system aimed at enhancing Human‐AI collaboration for supporting and augmenting SSRL processes. Through empirical research, the study offers lessons learned and design considerations for developing artificial agents on facilitating and enhancing SSRL among learners, demonstrating how AI can play a pivotal role in collaborative learning environments. The findings highlight the critical importance of multidisciplinary knowledge in the design of multi‐agent interfaces (MAI) that provide real‐time, adaptive support for group metacognitive processes and decision‐making. Implications for practice and/or policy Educational technologists can utilize the proposed design principles in the development and integration of MAI tools to enhance SSRL. Educators can incorporate the principles of MAI and our relevant findings into their teaching strategies to actively foster and support socially shared regulation of learning among students. Policymakers should consider revising educational frameworks to include the use of AI technologies that support SSRL strategies in collaborative learning.
... Listening to faster temporal cues, such as simple sound clicks, seems to be linked to longer estimates of duration. This has been found when participants were solely occupied with hearing those clicks, both in retrospective and prospective settings (Polkosky & Lewis, 2002;Wearden, 2016) and when cues occurred in the background while participants performed other tasks (Zakay et al., 1983). This suggests that recurring auditory events may affect temporal judgements even when listeners are not directly focused on them, a possibility that relates to the limits of introspection referred to by Juslin (2019), who notes that emotions can be elicited by inputs which one is consciously unaware of. ...
Article
Full-text available
Music listening affects time perception, with previous studies suggesting that a variety of factors may influence this: musical, individual, and environmental. Two experiments investigated the effect of musical factors (tonality and musical tempo) and individual factors (a listener’s level of musical sophistication) on subjective estimates of duration. Participants estimated the duration of different versions of newly-composed instrumental music stimuli under retrospective and prospective conditions. Stimuli varied in tempo (90 bpm -120 bpm) and tonality (tonal-atonal), in a 2x2 factorial design, while other musical parameters remained constant. Estimates were made using written estimates of minutes and seconds in Experiment 1, and the reproduction method in Experiment 2. Two-way ANOVAs showed no main effect of tonality on estimates and no significant interactions between tempo and tonality, under any condition. Musical tempo significantly affected estimates, with the faster tempo leading to longer estimates, but only in the prospective condition, and with the use of the reproduction method. Correlation matrices using the Pearson correlation coefficient found no correlation between musical sophistication scores (measured using the Gold-MSI) and verbal or reproduction estimates. In conclusion, together with the existing literature, findings suggest that: 1) changes in tonality, without further changes in rhythm, meter or melodic contour, do not significantly affect estimates, 2) small changes in musical tempo influence only prospective reproduction estimates, with larger tempo differences or longer stimuli being needed to cause changes in retrospective estimates, 3) participants’ level of musical sophistication does not impact estimates of musical duration, 4) empirical research on music listening and subjective time must consider potential method-dependent results.
... Second, user performance has been shown to be impaired by SRT delays [28,29,36,37]. Third, system delays have resulted in impaired psychophysiological well-being, increased anxiety, frustration and stress, and were even found to reduce job satisfaction [31,[38][39][40][41]. ...
Chapter
Full-text available
How are different measures of user experience (UX) related to each other? And does it differ if a technological device is used for work or leisure with regard to UX? In the present study, the influence of context factors (i.e. usage domain) on the outcomes of UX tests is examined. Using a 2 × 2 experimental design, in addition to usage domain (work vs. leisure), system usability was manipulated (normal vs. delayed response time). Sixty participants completed various tasks with a mobile internet application. Performance indicators and subjective indicators of UX were recorded (e.g. emotion, perceived usability, and task load). Interestingly, results indicated little evidence for an influence of usage context on UX. System usability showed the expected effects on performance and on user emotion, whereas no influence on perceived usability was observed. In addition, the correlations between the different measures of UX were rather low, indicating that it is advisable to assess UX by distinct dimensions. Implications of these results for practice and research are discussed.
... In this study, when the time spent for each desired task is investigated, it can be realized that the respond time of web site is fast. As indicated in resent study, the respond time of a site affects the usability of this site by users (Polkosky and Lewis, 2002;Krug, 2007). ...
Article
This study aims to investigate the usability level of web site of a university by observing 10 participants who are required to complete 11 tasks, which have been defined by the researchers before to gather data about effectiveness, efficiency and satisfaction. System Usability Scale was used to collect data about satisfaction. The research result, completed by the 10 participations show that the tasks’ average completion time is 1614,6 seconds and average success score is 93.36. In addition, most of the participants indicated that it was difficult to use the web site when mobile device is used. All the participants show a positive attitude and belief that this site helps the users about finding information about university.
... In the preparation of the software, the following criteria were taken into consideration: the selection of colors was performed in a way that would apply to the children involved, frequent use of visual materials, the audio content as a whole, both audio and visual feedbacks, the use of simple single menu structure, menu labels as well as their visual ones, a small number of menus, the period of giving feedback, the appropriateness of the colors used for the target population, and also easy-to-use quality of the software. Also in similar studies, it is emphasized that in the preparation of educational software or web pages, the criteria such as the number of menus (Shneiderman, 1997), the period of response (Polkosky & Lewis, 2002), the appropriateness of the colors used (Shneiderman, 1997), its consistency and easiness of its use (Nelsen, 1999) increase the usability of the software. While the subjects were being briefed on the process, the audio expressions were supported by the descriptions through the sign language, as well. ...
Article
Full-text available
Early intervention and early education have a special place in educating the children with Impaired Hearing (IH). The advancements in information and communication technologies have led to adopting the view that such technologies could be applied in the educational process of the children with IH. Besides, the positive results acquired in the studies conducted in the light of this review have brought up the fact that proper technology-based educational environments should be provided and popularized for the young children with IH. In this study, educational software has been developed for the purpose of teaching emotions and opposite concepts to young children with IH. With this software, videos with topic descriptions, games reinforcing funny and topic-based learning, questions and audio-visual feedbacks have been used. The effectiveness of this software in concept education along with its usability by children has been examined; and in addition, the subjective viewpoints of the teachers of students with IH on this software have been consulted as well.
... For example, psychological research has shown that people's perception and judgment about duration may differ across types of tasks as well as between completed versus prospective tasks (AvniBabad & Ritov, 2003; Block & Zakay, 1997; Fraisse, 1963; Zakay & Block, 2004 ). Conrad and colleagues (2010) also noted research that showed a relationship between task experience and duration judgment (London & Monello, 1974; Sackett, Meyvis, Nelson, Converse, & Sackett, 2010) and between frequency of feedback and such judgment (Polkosky & Lewis, 2002 ). Research using eyetracking movement devices could analyze how often and when respondents seek feedback and help us understand how that affects decisions to drop off or to continue a survey. ...
Article
Full-text available
The use of progress indicators seems to be standard in many online surveys. Researchers include them in surveys in the hope they will help reduce drop-off rates. However, there is no consensus in the literature regarding their effects. In this meta-analysis, we analyzed 32 randomized experiments comparing drop-off rates of an experimental group who completed an online survey in which a progress indicator was shown to drop-off rates of a control group to whom the progress indicator was not shown. In all the studies, a drop-off was defined as a discontinuance of the survey (at any point) after it has begun, resulting in failure to complete the survey. Three types of progress indicators were analyzed: constant, fast-to-slow, and slow-to-fast. Our results show that, overall, using a constant progress indicator does not significantly help reduce drop-offs and that effectiveness of the progress indicator varies depending on the speed of indicator: Fast-to-slow indicators reduced drop-offs, whereas slow-to-fast indicators increased drop-offs. We also found that among the studies in which a small incentive was promised, showing a constant progress indicator increased the drop-off rate. These findings question the common belief that progress indicators help reduce drop-off rates.
... We have extended the work of Crease and Brewster [3] into situations where there is no visual representation and the sound must carry the entire informational load. Polkosky and Lewis [4] used the rate of ticking to indicate temporal information in a hold situation. Kortum, Peres, Knott and Bushey [5] used regularly spaced stimuli that changed in either pitch or duration for the same purpose. ...
Conference Paper
This paper presents the comparison of Auditory Progress Bars using segmented cello tones to those using sine tones in an on-hold telephone setting. Previous research suggests that for segmented sine tone APBs, there is an interaction between APB type(pitch and duration) and APB polarity (increasing or decreasing). However, for the cello tones, there was a main effect of direction, with increasing APBs resulting in better performance than decreasing APBs, and there was no effect of APB type (pitch or duration). As anticipated, overall performances were very similiar for both types of segmented APBs. Contrary to expectations, users gave the cello tone APBs equally low subjective ratings to those they gave the sine tone APBs. Whole song APBs produced very positive subjective ratings and performance similiar to the sine and cello tone APBs.
Article
Full-text available
This study investigated gamers' auditory experiences as after effects of playing. This was done by classifying, quantifying, and analysing 192 experiences from 155 gamers collected from online videogame forums. The gamers' experiences were classified as: (i) involuntary auditory imagery (e.g., hearing the music, sounds or voices from the game), (ii) inner speech (e.g., completing phrases in the mind), (iii) auditory misperceptions (e.g., confusing real life sounds with videogame sounds), and (iv) multisensorial auditory experiences (e.g., hearing music while involuntary moving the fingers). Gamers heard auditory cues from the game in their heads, in their ears, but also coming from external sources. Occasionally, the vividness of the sound evoked thoughts and emotions that resulted in behaviours and copying strategies. The psychosocial implications of the gamers' auditory experiences are discussed. This study contributes to the understanding of the effects of auditory features in videogames, and to the phenomenology of non-volitional auditory experiences.
Chapter
This chapter discusses the various kinds of speech output that are appropriate for an interface. The major advantage of using speech in an interface is its universality; almost everyone understands spoken language. As one of the oldest forms of human communication, humans have evolved to process speech effectively and efficiently. To hear a voice message, one need not be positioned captively as when using a terminal. The listener is also free to use other modalities, such as vision for processing other inputs. Thus, a person can listen to speech for instructions and simultaneously operate a terminal, automobile, or crane. When riding in an automobile or flying an aircraft, one does not want the driver or pilot to alternate between reading and driving. With speech output, there is more likelihood that the message will be attended to. By comparison, written instructions are much easier to ignore. However, speech transmission and storage requires much higher bandwidth then corresponding text.
Chapter
This chapter discusses subjective time and attentional resource allocation. A convenient paradigm widely used to study time estimation processes involves manipulating the level of information processing load during an interval and measuring the resulting changes in the length of its duration estimation. Ornstein introduces the storage-size metaphor to explain the time estimation process. According to this model, the higher the complexity of a stimulus, the higher is the subjective estimate of the duration of its exposure. This follows from the assumption that storage size, on which time estimation is based, is larger for a complex stimulus than for a simple one. Attentional models are the one in which the judged duration is a direct function of the amount of attention allocated to the passage of time. It is assumed that a cognitive unit (a timer or counter), exists, whose purpose it is to process and encode temporal information. This counter, however, demands attentional resources for its operation.
Article
With the advent of time-shared interactive systems, the psychological impact of system response time (SRT) and SRT variability (SRTV) has become an important issue. Carbonell, et. al. (1968) have suggested that both SRT and SRTV may influence system user performance and satisfaction. A number of studies have been reported that address this issue, but results have been mixed. In this experiment, twenty subjects ( Ss) executed keyboard entries to control the temperature of a simulated industrial process via time-shared process control computer. Temperature was displayed in analog form on a CRT. The task was to maintain the temperature within upper and lower bounds that were clearly indicated on the display. The experimental design consisted of repeated measures with SRT and SRTV as experimental factors. Three SRT's were employed (2, 6 and 10 seconds) with two levels of SRTV (σ ² = 0 and σ ² = 0.33 seconds). Mean blood pressure (MBP) and heart rate (HR) were continuously monitored during experimental conditions. Task errors were defined as temperature excursions beyond the displayed bounds. Significant main effects were not obtained for MBP or HR. Significant differences for error rate (performance) were noted for SRT ( F(2, 15) = 23.10, p < .05), subjects x SRT ( F37, 15) = 1.66, p < .001), and SRT x SRTV ( F(1, 15) = 13.14, p < .05). Error data are consistent with the Carbonell et. al. suggestion that long and variable SRT may affect user performance. The results are discussed in terms of their incongruity with current literature.
Article
This study investigated the effects of music styles on number of lost calls (premature disconnections) to a busy state abuse hot line. This hot line received hundreds of calls a day and had difficulty keeping up with the volume of incoming calls. The objective of this study was to determine whether different styles of music would reduce the average number of lost calls. The music that callers listened to while on hold waiting for an available counselor was controlled for 10 weeks. The five musical styles used were classical, popular, music arranged for relaxation, country, and jazz. The music was changed every week for 5 weeks, allowing each musical style to play for 1 week. The entire procedure was repeated for an additional 5 weeks. A one-way analysis of variance revealed significant differences (F = 3.85, df = 4, 65, p <.05). The Newman-Keuls Multiple Comparison Procedure revealed significance between mean lost calls during relaxation and jazz music and between relaxation and country music. The results indicated that the average of lost calls was greatest when the relaxation music was on. The lowest number of lost calls occurred when the jazz music played, followed by country, then classical, popular, and finally, relaxation music.
Article
The use of the World Wide Web (WWW) exploded with the advent of graphical WWW browsers such as NCSA's Mosaic and Netscape's Navigator. In spite of the popularity of these graphical browsers, studies have uncovered areas where the traditional graphical interfaces do not provide correct, sufficient, or intuitive information to their users (Ede & Roshak, 1994; Groff & Descombes, 1994). The Audible Web (non-speech auditory feedback cues embedded within Mosaic[1] to aid user monitoring of data transfer progress that aid users in navigation of the WWW) is one approach to enhancing a WWW browser (Albers & Bergman, 1995). To test these claims, a usability study was conducted that identified The Audible Web's strengths and weaknesses and recommend possible solutions for these weaknesses (Sinclair, Catledge, Brown & His, 1995).
Article
In this experiment, subjects entered data at a computer terminal while the response time of the computer was varied systematically. Long average response times were found to be associated with significantly longer subject “think times”, as was an increase in the variability of the computer's response time. Six subjects entered five-character letter groups under ten different computer response time conditions. The computer response time distributions used had mean values of 2, 4, 8, 16, and 23 seconds, with two different levels of variability at each mean value. User error rate and user typing time were not significantly affected by computer response time, but computer response time was significantly related to user response time (or think time) with this task. User response time increased slowly and gradually with increases in computer response over the range of stimulus values used. Increases in computer response time variability also increased user response time.
Article
Two experiments investigated the relationship between long-term memory for events occurring during an interval and the experience of duration of the interval in retrospect. In both, Ss attended to a sequence consisting of a standard, an experimental, and a second standard interval. Then unexpected comparative duration and memory judgments were requested. In Experiment I, either 30 or 60 unrelated words occurred during the 180-see experimental interval. When more words had occurred, judgments of duration of the experimental interval, judgments of number of words presented, and number of words recognized all increased, but free recall of words was unaffected. In Experiment II, 80 categorized words occurred during the 160-see experimental interval, with category instances in either blocked or random order. When words were blocked by category, judgments of duration of the experimental interval, free recall, and recognition all increased, but judgments of number of words were unaffected. Results were discussed in terms of Ornstein's (1969) "storage size" hypothesis.
Article
Subjects solved a two-parameter optimization problem using a graphic display with light-pen interaction via either two linear light potentiometers or one planar light potentiometer. Normally distributed computer response times were used, with a mean of 1.0 s and a range of 0.2 to 3.4 s. Three standard deviation values were used: 0.2, 0.4, and 0.8 s. Potentiometer dimensionality affected the extent to which each of the two parameters was searched, but had no significant effect on solution time. System response time variability affected the time taken to solve the problem, and its effect interacted strongly with time of day.
Article
If there is an even number of experimental conditions (Latin letters), it is possible to construct a Latin Square in which each condition is preceded by a different condition in every row (and in every column, if desired). These designs are useful in counterbalancing immediate sequential, or other order, effects. A simple, and easily remembered, procedure by which to construct such squares is described and illustrated. A proof is offered which shows that the procedure is valid for any size square having an even number of cells on a side.