ArticlePDF Available

Models of Throughput Rates for Dictation and Voice Spelling for Handheld Devices

Authors:

Abstract and Figures

Since the emergence of the personal digital assistant (PDA), developers have attempted to create input methods that allow users to enter accurate data at speeds that approach those achieved with the personal computer. Common text entry methods (handwriting and soft keyboard) allow for rates that are unacceptably slow for many purposes. The objective of this paper is to consider the possible benefits of speech-to-text input mechanisms (dictation and voice spelling) for handheld devices. By modeling throughput based on varying rates of speech, correction speeds, and system recognition accuracies, we can compare expected speech throughput rates to current throughput rates for PDAs.
Content may be subject to copyright.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 7, 69–79, 2004
c
2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Models of Throughput Rates for Dictation and Voice Spelling
for Handheld Devices
PATRICK M. COMMARFORD
IBM Corporation, 8051 Congress Ave, Suite 2228, Boca Raton, FL 33487, USA
commarfo@us.ibm.com
JAMES R. LEWIS
IBM Corporation, 8051 Congress Ave, Suite 2227, Boca Raton, FL 33487, USA
Abstract. Since the emergence of the personal digital assistant (PDA), developers have attempted to create input
methods that allow users to enter accurate data at speeds that approach those achieved with the personal computer.
Common text entry methods (handwriting and soft keyboard) allow for rates that are unacceptably slow for many
purposes. The objective of this paper is to consider the possible benefits of speech-to-text input mechanisms
(dictation and voice spelling) for handheld devices. By modeling throughput based on varying rates of speech,
correction speeds, and system recognition accuracies, we can compare expected speech throughput rates to current
throughput rates for PDAs.
Keywords: personal digital assistant (PDA), dictation, voice spelling, speech interface
Introduction
Recent years have seen the emergence and rising pop-
ularity of handheld personal digital assistants (PDAs).
These devices have many benefits, including being
small, lightweight, and extremely mobile. To date, there
have been two primary methods of inputting data to a
PDA—tapping a small onscreen (soft) keyboard and
using highly constrained handwriting recognizers such
as Graffiti (Graffiti is a registered trademark of Palm
Computing, Inc.) or Unistrokes (Unistrokes is a reg-
istered trademark of Xerox Corp.). The current input
speeds for these methods, however, are substantially
slower than input rates achieved with a personal com-
puter and keyboard.
Virtually all users of PDAs have experience with
personal computers and have some familiarity with
the standard computer keyboard (the QWERTY key-
board), with which expert typists can enter data at a
rates of approximately 55 words per minute (WPM)
with near perfect accuracy (Marklin et al., 1998;
Norman and Fisher, 1982). Prior research (discussed
below) has shown substantially slower rates of input for
various handwriting recognizers and soft keyboards.
Hand printing speeds are typically in the 12–
23 WPM range, and cursive handwriting speeds
range from 16 to just over 30 WPM (Soukoreff and
MacKenzie, 1995). These, necessarily, provide esti-
mates of the upper limits for this sort of text entry.
MacKenzie and Chang (1999), using two discrete print-
ing recognizers and a 9.5-inch tablet, found a mean text
entry speed of 17.1 WPM. The mean recognition accu-
racy was 92% when the recognizer was constrained to
lowercase letters and 90% when constrained to upper
and lower case letters. MacKenzie and Zhang (1997),
found high (95.8%) recognition accuracy for experi-
enced users of the Graffiti handwriting recognition sys-
tem, but did not report the entry speeds. Sears and
Arora (2001) reported a much slower text entry rate
of 4.95 WPM for the Graffiti recognizer with a recog-
nition accuracy of 95% for participants using a PDA
(rather than a tablet). They found that participants us-
ing the Jot recognizer were able to produce 7.74 WPM
with an average recognition accuracy of 88% (Jot is
70 Commarford and Lewis
aregistered trademark of Communication Intelligence
Corporation). Each of the previously mentioned studies
reported rates of entry for uncorrected text. Kleid and
Bonto (1995) asked users to enter a rather complex set
of letters, numbers, and special characters (a person’s
contact information) using Graffiti on a 6

(diagonal)
screen. Participants were to attempt 100% accuracy and
to use any editing tools that they felt would be helpful.
The participants were only able to enter 1.98 corrected
words per minute (CWPM).
Research has shown stylus tapping on a soft
QWERTY keyboard to be slightly faster than using
a handwriting recognizer, but these rates are still sub-
optimal. Zha and Sears (2001) reported that partici-
pants could input text on a PDA at a rate of 12.62 WPM
with an average input accuracy of 96% using the soft
keyboard. Using a tablet, MacKenzie et al. (1994)
found that users could type text at a rate of 22.9 WPM
with 99% accuracy. MacKenzie et al. (1999) reported
text entry rates of 20.2 WPM for participants tapping on
a full-sized paper QWERTY layout. Kleid and Bonto
(1995) had their participants use a soft keyboard to en-
ter the previously described set of complex text with
100% accuracy. Under these conditions, participants
were only able to obtain mean throughput rates of
5.17 CWPM.
In the past decade, users have been introduced to a
new method of inputting data to a personal computer—
speech dictation. While voice throughput is not yet typ-
ically as fast as typing with a full-sized QWERTY key-
board, it may offer a faster rate of throughput for hand-
held devices than the currently available options. Lewis
(1999) defined true throughput as the number of correct
words produced per minute, and found that participants
could achieve rates of 31.0 CWPM with multimodal
(manual and vocal) correction and 19.0 CWPM with
voice-only correction, using two commercially avail-
able desktop speech dictation products. Voice-only cor-
rection was significantly slower (29.1 seconds per cor-
rection) than multimodal correction (13.2 seconds per
correction).
If system designers could embed voice recognition
technology into the PDA environment in a way that al-
lows users to achieve throughput rates similar to those
observed by Lewis (1999), users would benefit greatly.
By 1999, estimates of recognition accuracy for com-
mercial desktop dictation systems ranged from 90–
95% (Koester, 2001; Lewis, 2001). Resource limita-
tions of these handheld devices, however, will probably
prevent the high levels of recognition accuracy reached
by the desktop software. In addition, multimodal cor-
rection on a PDA will include the use of a method of
input (soft keyboard or handwriting recognizer) that is
less efficient than those used with the desktop systems
in the Lewis study.
It is reasonable to consider the efficiency of two
methods of inputting data by voice in a PDA environ-
ment: speech dictation and voice spelling. These meth-
ods can exist in isolation or in combination. To dictate,
users would simply say the words they want to appear
on the screen. To voice spell, the user would say codes
for each letter of the alphabet. Due to the well-known
acoustic similarities of certain letters (for example, “B”
and “D” or “S” and “F”), recognition accuracy for letter
names is much lower than for a carefully selected set of
code words (Lewis and Commarford, 2002). For exam-
ple, rather than saying “d” the user would say, “dog. It
would be possible to present a reminder display when
in voice spelling mode for users who do not have the
codes committed to memory.
The user would certainly be able to produce more
uncorrected words per minute via dictation than by
voice spelling. However, because of the limited gram-
mar set, voice spelling would probably achieve higher
levels of system recognition accuracy than dictation,
potentially reducing the need to correct. Furthermore,
corrections speeds for voice spelling (single-character
corrections) would probably be faster than correction
speeds for dictation (full-word corrections). Prior mod-
eling (Lewis, 1999) suggests that system recognition
accuracy and correction speeds are more important de-
terminants of true throughput than speaking rate. Mod-
eling true throughput (CWPM) based on input method,
system accuracy, time per correction, and speaking rate
can help us understand how substantial the accuracy
difference between spelling and dictation would have
to be for it to be advantageous to use the spell mode.
These models would also allow comparisons of each
of these speech input methods to reported throughput
rates for other PDA input methods.
The aforementioned research suggests faster rates
of throughput for soft keyboard input than for hand-
writing recognizers with a PDA. The Zha and Sears
(2001) study seems to provide the best estimate of PDA
throughput with a soft keyboard. These researchers
used a PDA and asked participants to input a 41-word
passage characteristic of a short business email mes-
sage. They found a mean input rate for this task of
12.62 WPM, with a 4% error rate. Assuming a very
high speed of correction, it seems reasonable to set the
Modeling Dictation 71
benchmark for the true throughput rate for soft key-
board input at 12 CWPM.
The rest of this report describes performance mod-
eling conducted to make the following comparisons of
throughput rates:
(a) dictation for the 150-WPM speaker vs. the 100-
WPM speaker
(b) expert spelling for the 150-WPM speaker vs. the
100-WPM speaker
(c) novice spelling for the 150-WPM speaker vs. the
100-WPM speaker
(d) dictation vs. expert spelling for the 150-WPM
speaker
(e) dictation vs. expert spelling for the 100-WPM
speaker
In addition, there will be a comparison of each
speech input rate to the target rate of 12 CWPM, which
appears to be the best estimate of the most efficient
method of text input currently available for stand-
alone PDAs (not docked and connected to a personal
computer).
Method
The source text for this evaluation was a 107-word
passage with 483 letters. This passage contained 101
spaces and seven punctuation marks. The passage con-
tained no capitalization, except for characters follow-
ing periods. We created six models of true through-
put (using CWPM) for speech dictation and voice
spelling (see Table 1). Each model contains a range
of system recognition accuracies from 50 to 100%
and a range of correction times from 5 to 35 sec-
onds.
The expert voice spelling models assume complete
automaticity in the assignment of the letters to their
Table 1. Description of six throughput models.
Input method User Speaking rate
Dictation All 100 WPM
150 WPM
Voice spelling Novice 100 WPM
150 WPM
Expert 100 WPM
150 WPM
respective codes (in other words, users need no pro-
cessing time to match letters to their codes or to retrieve
them from short term memory). The novice models as-
sume a 230 ms eye movement time to locate each letter
in the passage on the spell vocabulary reminder display.
The 230 ms eye movement time is consistent with that
given as the typical or “middle man” time by Card et al.
(1983).
The Models
Dictation Model
This model is a replication of that created by Lewis
(1999), but extends the upper limit for correction times
to 35 s because it is reasonable to anticipate longer cor-
rection times with a handheld device. Table 2 shows
the expected throughput data (in CWPM) for a 150-
WPM speaker and a 100-WPM speaker at varying lev-
els of system recognition accuracy and varying correc-
tion speeds. The bold numbers in the table indicate the
point at which speech becomes competitive with soft
keyboard input for each combination of speaking rate,
correction speed, and recognition accuracy (in other
words, exceed 12 CWPM). Figure 1 illustrates this
relationship.
The following worked example illustrates the
method by which we calculated CWPM for dictation.
The example is based on the cell in Table 2 that pro-
vides the throughput rate for a 150-WPM speaker who
takes 25 seconds on average to complete a correction
and is using a system that accurately recognizes 80%
of the dictated speech.
We calculated CWPM by dividing the number of
words to appear in the target application on the PDA
screen (107 in this case) by the total amount of time
it would take, in minutes, for the corrected text to ap-
pear. Total time consists of time to speak the passage
and time to correct the misrecognitions. We calculated
time to speak by dividing the number of words to be
spoken (107) by the speaking rate (150 WPM). The
result was 0.71 minutes to speak the 107-word pas-
sage. Multiplying the number of required corrections
by the correction speed (25 seconds) provided the cor-
rection time. We determined the number of required
corrections by multiplying the number of words spoken
(107) by the error rate (1 recognition accuracy). For
the example data, the error rate is .20, resulting in 21.4
errors. Multiplying 21.4 errors by the 25-second correc-
tion speed produced 535 seconds, or 8.92 minutes. The
72 Commarford and Lewis
Table 2. Model of expected throughput (CWPM) rates for dictation.
Recognition accuracy
Speaking rate Correction speed 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50%
150 WPM 5 sec 150 92.31 66.67 52.17 42.86 36.36 31.58 27.91 25.00 22.64 20.69
15 sec 150 52.17 31.58 22.64 17.65 14.46 12.24 10.62 9.38 8.39 7.59
25 sec 150 36.36 20.69 14.46 11.11 9.02 7.59 6.56 5.77 5.15 4.65
35 sec 150 27.91 15.38 10.62 8.11 6.56 5.50 4.74 4.17 3.72 3.35
100 WPM 5 sec 100 70.59 54.55 44.44 37.50 32.43 28.57 25.53 23.08 21.05 19.35
15 sec 100 44.44 28.57 21.05 16.67 13.79 11.76 10.26 9.09 8.16 7.41
25 sec 100 32.43 19.35 13.79 10.71 8.76 7.41 6.42 5.66 5.06 4.58
35 sec 100 25.53 14.63 10.26 7.89 6.42 5.41 4.67 4.11 3.67 3.31
0
20
40
60
80
100
120
140
160
100 95 908580757065605550
Recognition Accuracy
Throughput (CWPM)
S150- E 05
S150- E 15
S150- E 25
S150- E 35
S100- E 05
S100- E 15
S100- E 25
S100- E 35
S150 = 150 WPM speaking rate
S100 = 100 WPM speaking rate
E05 = 5 seconds per correction
E15 = 15 seconds per correction
E25 = 25 seconds per correction
E35 = 35 seconds per correction
Figure 1. Model of expected throughput (CWPM) rates for dictation.
sum of 0.71 minutes (speaking time) and 8.92 minutes
(correction time) was 9.63 minutes (total time). Finally,
dividing the 107 corrected words by 9.63 minutes of to-
tal time produced the presented value of 11.11 CWPM
(see Table 2).
Referring to Table 2, beginning with the right hand
side of the table, we can see that with a recogni-
tion accuracy of 50%, the user would need to be able
to make a correction every five seconds for dicta-
tion to compete with soft keyboard input. This speed
Modeling Dictation 73
of correction seems unrealistically fast given Lewis’
(1999) research, in which he found average multimodal
correction speeds of 13.2 seconds per correction with
desktop speech dictation systems. Given the size and
processing limitations of the PDA, correction times for
dictation would likely be a little slower, perhaps be-
tween 15 and 20 seconds. If we assume 15 seconds
per correction, recognition accuracies as low as 70–
75% would produce mean true throughput rates that are
highly competitive with the target of 12 CWPM, even
for a 100-WPM speaker. If correction speeds were as
slow as 25 seconds per correction, a recognition accu-
racy of about 85% would be necessary for dictation to
compete with soft keyboard input. At 35 seconds per
correction, the target accuracy would be 90%.
Expert Speller Model
The expert speller model allows estimation of the voice
spelling throughput rates for a speaker for whom the
spell letter codes have become automatic. Table 3
shows the expected spell data for an expert speller,
speaking 150 and 100 WPM at varying levels of recog-
nition accuracy and varying correction speeds. The
bold numbers in the table indicate the point at which
speech becomes competitive with soft keyboard input.
Figure 2 illustrates this relationship.
The calculations for the expert spelling model are
similar to those for the dictation model, except that the
number of words spoken is now 591 (the user must say a
code word for each of 483 letters, must say, “space” 101
times, and must say, “period” 7 times) and corrections
are at the character rather than the word level. The
following example goes through the steps of calculating
throughput for an expert speller speaking 150-WPM,
correcting one character every 25 seconds, and using a
Table 3. Model of expected throughput (CWPM) rates for the expert speller.
Recognition accuracy
Speaking rate Correction speed 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50%
150 WPM 5 sec 27.16 16.71 12.07 9.45 7.76 6.58 5.72 5.05 4.53 4.10 3.75
15 sec 27.16 9.45 5.72 4.10 3.19 2.62 2.22 1.92 1.70 1.52 1.38
25 sec 27.16 6.58 3.75 2.62 2.01 1.63 1.38 1.19 1.04 0.93 0.84
35 sec 27.16 5.05 2.79 1.92 1.47 1.19 1.00 0.86 0.75 0.67 0.61
100 WPM 5 sec 18.10 12.78 9.88 8.05 6.79 5.87 5.17 4.62 4.18 3.81 3.50
15 sec 18.10 8.05 5.17 3.81 3.02 2.50 2.13 1.86 1.65 1.48 1.34
25 sec 18.10 5.87 3.50 2.50 1.94 1.59 1.34 1.16 1.02 0.92 0.83
35 sec 18.10 4.62 2.65 1.86 1.43 1.16 0.98 0.85 0.74 0.66 0.60
system that accurately recognizes 80% of the speech.
Dividing the number of utterances to be spoken (591)
by the 150-WPM speaking rate produced 3.94 min-
utes to speak the passage. Multiplying the number of
utterances spoken (591) by the error rate (.20) pro-
duced 118.2 errors, which would require 2955 seconds
(49.25 minutes) to correct at 25 seconds per error. Sum-
ming the time to speak (3.94 minutes) and the time
to correct (49.25 minutes) produced a total time of
53.19 minutes. Dividing 107 words in the passage by
53.19 produced 2.01 CWPM (see Table 3).
Table 3 shows that, with low levels of system recog-
nition accuracy, voice spelling will produce unaccept-
ably slow input rates, regardless of correction speed.
Assuming a correction speed of 15 seconds, recogni-
tion accuracy would need to be greater than 95% for
voice spelling to be a competitive method of inputting
text. However, the previously cited correction speeds
(Lewis, 1999) were for full-word corrections that re-
sulted from system misrecognitions of user dictation.
Correcting the misrecognition of a single character
would presumably be much quicker (see Koester, 2001,
p. 123), requiring only two actions (tapping or saying
backspace, followed by tapping the appropriate key or
saying the appropriate code). Assuming five seconds
per correction, 90% recognition accuracy would allow
150-WPM speakers to voice spell at a rate competi-
tive with soft keyboard input (12.07 CWPM). Slower
speaking expert spellers would require a recognition
accuracy of 95% for voice spelling to be a competitive
alternative.
Novice Speller Model
The novice speller model allows the estimation of the
throughput rates for a speaker who is just learning to
74 Commarford and Lewis
0
5
10
15
20
25
30
100 95 908580757065605550
Recognition Accuracy
Throughput (CWPM)
S150- E 05
S150- E 15
S150- E 25
S150- E 35
S100- E 05
S100- E 15
S100- E 25
S100- E 35
S150 = 150 WPM speaking rate
S100 = 100 WPM speaking rate
E05 = 5 seconds per correction
E15 = 15 seconds per correction
E25 = 25 seconds per correction
E35 = 35 seconds per correction
Figure 2. Model of expected throughput (CWPM) rates for the expert speller.
use spell mode and has not yet memorized the letter
codes. Table 4 shows the expected throughputs for a
novice speller at 150 WPM and at 100 WPM with
varying levels of system recognition accuracy and vary-
ing correction speeds. The bold numbers in the table
indicate the point at which speech becomes competi-
tive with soft keyboard input. Figure 3 illustrates this
relationship.
We calculated novice speller throughput times in
much the same way as the expert speller models. The
one exception is the addition of a time component,
code word search time, to the total time. The fol-
lowing describes the method of calculating through-
put for a novice speller speaking 150-WPM, requiring
25 seconds per correction for each misrecognized char-
acter, and using a system that accurately recognizes
80% of the spoken words.
The 3.94 minutes to speak the passage and the
49.25 minutes to correct the misrecognitions remain
the same as in the expert speller example above. How-
ever, we also assumed a 230 msec (.0038 minute)
search time for code words for each of the 483 let-
ters in the passage (but not for “period” or “space”).
Multiplying 483 code words by 0.0038 minutes per
search yielded 1.85 minutes of total search time. Sum-
ming time to speak (3.94 minutes), time to correct
(49.25 minutes), and time to search (1.85 minutes) re-
sulted in 55.04 minutes. Dividing 107 by 55.04 minutes
produced 1.94 CWPM (see Table 4).
The expected novice speller data follows a pattern
similar to the expert speller, but is slightly slower. At
five seconds per correction, novice spellers would re-
quire 95% or greater recognition accuracy to be com-
petitive with the soft keyboard.
Modeling Dictation 75
Table 4. Model of expected throughput (CWPM) rates for the novice speller.
Recognition accuracy
Speaking rate Correction speed 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50%
150 WPM 5 sec 18.47 12.96 9.98 8.12 6.84 5.91 5.20 4.65 4.20 3.83 3.52
15 sec 18.47 8.92 5.20 3.83 3.03 2.50 2.14 1.86 1.65 1.48 1.34
25 sec 18.47 5.91 3.52 2.50 1.94 1.59 1.34 1.16 1.03 0.92 0.83
35 sec 18.47 4.65 2.66 1.86 1.43 1.16 0.98 0.85 0.74 0.66 0.60
100 WPM 5 sec 13.79 10.47 8.43 7.06 6.08 5.33 4.75 4.28 3.90 3.58 3.30
15 sec 13.79 7.06 4.75 3.58 2.87 2.39 2.05 1.80 1.60 1.44 1.31
25 sec 13.79 5.33 3.30 2.39 1.88 1.54 1.31 1.14 1.01 0.90 0.82
35 sec 13.79 4.28 2.53 1.80 1.39 1.14 0.96 0.83 0.73 0.66 0.59
0
2
4
6
8
10
12
14
16
18
20
100 95 90 85 80 75 70 65 60 55 50
Recognition Accuracy
Throughput (CWPM)
S150-E05
S150-E15
S150-E25
S150-E35
S100-E05
S100-E15
S100-E25
S100-E35
S150 = 150 WPM speaking rate
S100 = 100 WPM speaking rate
E05 = 5 seconds per correction
E15 = 15 seconds per correction
E25 = 25 seconds per correction
E35 = 35 seconds per correction
Figure 3. Model of expected throughput (CWPM) rates for the novice speller.
150-WPM Speaker Model
This model provides expected throughputs for some-
one who speaks at a rate of 150 WPM for dictation
and for expert spelling. The data enable the determi-
nation, for a given correction speed, of how much dif-
ference in recognition accuracy must exist for expert
voice spelling to be more efficient than dictation. Ta-
ble 5 shows the expected dictation and expert spelling
throughput rates for a user who speaks at a rate of
76 Commarford and Lewis
Table 5. Model of expected throughput (CWPM) rates for the 150-WPM speaker.
Recognition accuracy
Input method Correction speed 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50%
Dictation 5 sec 150.00 92.31 66.67 52.17 42.86 36.36 31.58 27.91 25.00 22.64 20.69
15 sec 150.00 52.17 31.58 22.64 17.65 14.46 12.24 10.62 9.38 8.39 7.59
25 sec 150.00 36.36 20.69 14.46 11.11 9.02 7.59 6.56 5.77 5.15 4.65
35 sec 150.00 27.91 15.38 10.62 8.11 6.56 5.50 4.74 4.17 3.72 3.35
Spelling 5 sec 27.16 16.71 12.07 9.45 7.76 6.58 5.72 5.05 4.53 4.10 3.75
15 sec 27.16 9.45 5.72 4.10 3.19 2.62 2.22 1.92 1.70 1.52 1.38
25 sec 27.16 6.58 3.75 2.62 2.01 1.63 1.38 1.19 1.04 0.93 0.84
35 sec 27.16 5.05 2.79 1.92 1.47 1.19 1.00 0.86 0.75 0.67 0.61
150 WPM for varying levels of system recognition ac-
curacy and varying correction speeds. The bold num-
bers in the table indicate the point at which speech be-
comes competitive with soft keyboard input. Figure 4
illustrates this relationship.
0
20
40
60
80
100
120
140
160
100 95 90 8580757065605550
Recognition Accuracy
Throughput (CWPM)
D-E05
D-E15
D-E25
D-E35
ES-E05
ES-E15
ES-E25
ES-E35
D = Dictation
ES = Expert Spelling
E05 = 5 seconds per correction
E15 = 15 seconds per correction
E25 = 25 seconds per correction
E35 = 35 seconds per correction
Figure 4. Model of expected throughput (CWPM) rates for the 150 WPM speaker.
As shown in Table 5, given equivalent system recog-
nition accuracy, dictation will be superior across all
correction speeds, with the greatest differences com-
ing at the higher levels of accuracy. Given the large
difference in the size of the grammar sets for the two
Modeling Dictation 77
modes, however, recognition accuracy is expected to be
higher and correction speeds faster when the user is in
spell mode. Lewis and Commarford (2002) developed
avoice spelling alphabet and tested its accuracy with
a desktop system and headset microphone. The gram-
mar set (which included letter codes for voice spelling,
punctuation, and cursor control commands) produced
results that were 97.5% accurate under these condi-
tions. Given that accuracy should be no higher (and
might be lower) with a PDA microphone, 97.5% is an
estimate of the upper limit to voice spelling accuracy.
Assuming 95% recognition accuracy and 5 seconds per
correction for voice spelling and assuming 15 seconds
per correction for dictation, dictation with recogni-
tion accuracy as low as 80% would be more efficient
than voice spelling (with a voice spelling through-
put of 16.71 CWPM and a dictation throughput of
17.65 CWPM).
100-WPM Speaker Model
This model provides expected dictation and voice
spelling throughput rates for an expert speller who
speaks at a rate of 100 WPM. Table 6 shows the ex-
pected throughput rates. The bold numbers in the table
indicate the point at which speech becomes competi-
tive with soft keyboard input. Figure 5 illustrates the
relationship.
Again, the data show that, given equivalent system
recognition accuracy, dictation should be superior to
voice spelling across all correction speeds, with the
greatest differences at the higher levels of accuracy.
Assuming 95% recognition accuracy and 5 seconds per
correction for voice spelling and assuming 15 seconds
per correction for dictation, dictation with recogni-
tion accuracy as low as 75% would be more efficient
Table 6. Model of expected throughput (CWPM) rates for the 100-WPM speaker.
Recognition accuracy
Input method Correction speed 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50%
Dictation 5 sec 100.00 70.59 54.55 44.44 37.50 32.43 28.57 25.53 23.08 21.05 19.35
15 sec 100.00 44.44 28.57 21.05 16.67 13.79 11.76 10.26 9.09 8.16 7.41
25 sec 100.00 32.43 19.35 13.79 10.71 8.76 7.41 6.42 5.66 5.06 4.58
35 sec 100.00 25.53 14.63 10.26 7.89 6.42 5.41 4.67 4.11 3.67 3.31
Spelling 5 sec 18.10 12.78 9.88 8.05 6.79 5.87 5.17 4.62 4.18 3.81 3.50
15 sec 18.10 8.05 5.17 3.81 3.02 2.50 2.13 1.86 1.65 1.48 1.34
25 sec 18.10 5.87 3.50 2.50 1.94 1.59 1.34 1.16 1.02 0.92 0.83
35 sec 18.10 4.62 2.65 1.86 1.43 1.16 0.98 0.85 0.74 0.66 0.60
than voice spelling (with a voice spelling through-
put of 12.78 CWPM and a dictation throughput of
13.79 CWPM).
General Discussion
Relatively Low Throughput for Handwriting
Recognition
Although it might seem counterintuitive, the evidence
from the published literature indicates that textual
throughput with a soft keyboard is more efficient than
handwriting recognition for PDAs, despite the im-
provements in recognition accuracy associated with
constrained alphabets such as Graffiti and Unistrokes.
The only cause of error in key tapping is tapping
the wrong key, but errors in handwriting recogni-
tion can be due to forming an incorrect character
(user error) or misrecognition of a correctly formed
character (system error). Furthermore, corrections
performed with handwriting recognition are also prob-
abilistic, making them more error prone than deter-
ministic key tapping. Thus, the low throughput ob-
served for handwriting recognition systems is most
likely the result of a longer total correction time.
Regardless of the cause, the throughputs reported in
the literature indicate that handwriting recognition is
a less effective input method than typing with soft
keyboards.
For these reasons, we have not included handwriting
throughput in our models. Instead, we have focused on
comparisons of speech input methods with the more
competitive method of soft keyboard input. For read-
ers interested in comparing speech throughput rates
with handwriting on a PDA, a generous estimate for
the true throughput of current handwriting recognition
78 Commarford and Lewis
0
20
40
60
80
100
120
100 95 90 8580757065605550
Recognition Accuracy
Throughput (CWPM)
D-E05
D-E15
D-E25
D-E35
ES-E05
ES-E15
ES-E25
ES-E35
D = Dictation
ES = Expert Spelling
E05 = 5 seconds per correction
E15 = 15 seconds per correction
E25 = 25 seconds per correction
E35 = 35 seconds per correction
Figure 5. Model of expected throughput (CWPM) rates for the 100 WPM speaker.
is 7 CWPM (based on the results for the Jot recognizer
in Sears and Arora, 2001).
Potential Usefulness of Voice Spelling and Dictation
as PDA Input Methods
The range of values used to model true throughput for
the speech input methods described in this paper have a
firm basis in previously published human performance
data, and were selected to encompass them. The typical
speaking rate for users who are transcribing text into a
dictation system is about 110 WPM (Lewis, 1999), so
we set the values for this rate in the models at 100 and
150 WPM.
Correction speeds ranged from about 13 to
30 seconds per correction in Lewis (1999), depending
on the correction strategy employed by users (multi-
modal for the faster times, voice only for the slower
times). Correction speeds for entry with standard key-
boards are faster, averaging about 3 seconds per cor-
rection (Koester, 2001). Given a soft rather than stan-
dard keyboard, it therefore seems reasonable to assume
about 5 seconds per correction for individual char-
acters, such as those produced during voice spelling.
Thus, the range of correction speeds included in our
models (5 to 35 seconds per correction) encompasses
correction speeds for keyed corrections, multimodal
corrections, and voice-only corrections.
Estimates of recognition accuracy for commercial
desktop speech dictation systems typically range from
90 to 95%, but with some additional variation above
and below that range (Koester, 2001; Lewis, 2001).
Resource and hardware limitations might prevent users
from achieving such high recognition accuracies with
aPDA. Therefore, the range of recognition accuracies
modeled (50 to 100%) seems appropriate to consider.
The models clearly show that both voice spelling and
dictation can be competitive with soft keyboard entry,
Modeling Dictation 79
and are therefore potentially useful methods for the in-
put of text into PDAs. All other things being equal, dic-
tation will be much more effective than voice spelling.
Due to their differing system requirements, however,
voice spelling might be the only viable method for
speech input of text into handheld devices with rela-
tively low system resources.
The models also illustrate the likely limits to dicta-
tion throughput as a function of accuracy and correc-
tion speed. Users can easily speak text at rates between
100 and 150 WPM, but are very unlikely to experience
true throughput much greater than 45 to 50 CWPM
(given 95% accuracy and multimodal correction speeds
of 15 seconds per correction) in the near future.
Summary of Key Findings
1. Assuming 15 seconds per correction with a PDA,
dictation would be as efficient as soft keyboard in-
put as long as speech recognition accuracy were
70% or greater. Assuming dictation recognition ac-
curacies as high as 85–90%, dictation would be ap-
proximately twice as productive as soft keyboard
entry.
2. Under ideal conditions (expert speller, 150-WPM
speaker, and 5-second correction speed), voice-
spelling accuracy must exceed 90% to be competi-
tive with soft keyboard entry.
3. Voice spelling will not be as efficient as dictation
unless voice spelling recognition accuracy is much
higher than dictation recognition accuracy.
Conclusions
Current methods for textual input to a PDA do not result
in a very high throughput. Of the two most common
contemporary methods, typing on a virtual keyboard
has a higher throughput than any approach to hand-
writing recognition. The best current estimate of true
throughput for typing on a PDAs virtual keyboard is
about 12 CWPM. The models presented in this paper
show that dictation on a PDA would have a substan-
tial throughput advantage over current methods. Voice
spelling is a viable alternative to current methods for
PDAs that do not have sufficient computing power to
support dictation. The true throughput for any speech
input method with less than perfect recognition accu-
racy is critically dependent on correction speed, making
the design of efficient correction procedures of utmost
importance.
References
Card, S.K., Moran, T.P., and Newell, A. (1983). The Psychol-
ogyofHuman Computer Interaction. Hillsdale, NJ: Lawrence
Erlbaum.
Kleid, N.A. and Bonto, M.A. (1995). Handwriting Recognition and
Soft Keyboard Study.(Tech. Report 29.3008). Raleigh, NC: Inter-
national Business Machines Corp.
Koester, H.H. (2001). User performance with speech recognition: A
literature review. Assistive Technology, 13:116–130.
Lewis, J.R. (1999). Effect of error correction strategy on speech
dictation throughput. Proceedings of the Human Factors and
Ergonomics Society 43rd Annual Meeting, Santa Monica, CA:
Human Factors Society, pp. 457–461.
Lewis, J.R. (2001). The Accuracy Wars: Journalists’ Estimates of
Continuous Speech Product Dictation Accuracy from 1997–1999
(Tech. Report 29.3465). Raleigh, NC: International Business Ma-
chines Corp.
Lewis, J.R. and Commarford, P.M. (2002). Developing and Tuning
aVoice Spelling Alphabet for Devices with Small Displays. (Tech.
Report 29.3517). Raleigh, NC: International Business Machines
Corp.
MacKenzie, S.I. and Chang, L. (1999). A performance compari-
son of two handwriting recognizers. Interacting with Computers,
11:283–297.
MacKenzie, S.I., Nonnecke, R.B., McQueen, J.C., Riddersma, S.,
and Meltz, M. (1994). A comparison of three methods of character
entry on pen-based computers. Proceedings of the Human Factors
and Ergonomics Society 38th Annual Meeting, Santa Monica, CA:
Human Factors Society, pp. 330–334.
MacKenzie, S.I. and Zhang, S.X. (1997). The immediate usability of
Graffiti. Proceedings of Graphics Interface’97,Toronto, Canada:
Canadian Information Processing Society, pp. 129–137.
MacKenzie, S.I., Zhang, S.X., and Soukoreff, R.W. (1999). Text en-
try using soft keyboards. Behaviour and Information Technology,
18:235–244.
Marklin, R.W., Simoneau, G.G., and Hoffman, D. (1998). Effects of
computer keyboard setup parameters and user’s anthropometric
characteristics on wrist deviation and typing efficiency. Proceed-
ings of the Human Factors and Ergonomics Society 42nd Annual
Meeting, Santa Monica, CA: Human Factors Society, pp. 876–
880.
Norman, D.A. and Fisher, D. (1982). Why alphabetic keyboards are
not easy to use: Keyboard layout doesn’t much matter. Human
Factors, 24:509–519.
Sears, A. and Arora, R. (2001). An evaluation of gesture recognition
for PDAs. In M.J. Smith, G. Salvendy, D. Harris, and R.J. Koubek
(Eds.), Usability Evaluation and Interface Design: Cognitive En-
gineering, Intelligent Agents and Virtual Reality, Mahwah, NJ:
Lawrence Erlbaum Associates, pp. 1–5.
Soukoreff, R.W. and MacKenzie, S.I. (1995). Theoretical up-
per and lower bounds on typing speed using a stylus and
soft keyboard. Behaviour & Information Technology, 14:370–
379.
Zha, Y. and Sears, A. (2001). Data entry for mobile devices using
soft keyboards: Understanding the effect of keyboard size. In M.J.
Smith, G. Salvendy, D. Harris, and R.J. Koubek (Eds.), Usability
Evaluation and Interface Design: Cognitive Engineering, Intelli-
gent Agents and Virtual Reality,Mahwah, NJ: Lawrence Erlbaum
Associates, pp. 16–20.
... In this work, we address this gap by conducting an eye-tracking study to investigate users' reading behavior and engagement across five modern dictation interfaces. Although eye movement research has studied reading extensively [8,42,48,54], we found that the use of eye movement in the evaluation of dictation interfaces remains relatively unexplored. ...
... In the speech production process, we captured screen recording via Tobii Pro Lab, which provides a comprehensive gaze matrix for further analysis. In the reviewing process, we used the Eye-Movement in Programming Toolkit (EMTK) 8 to automatically create areas of interest around each word. ...
... Interviews were recorded via audio. 8 https://github.com/nalmadi/EMIP-Toolkit ...
Preprint
Dictation interfaces support efficient text input, but the transcribed text can be hard to read. To understand how users read and review dictated text, we conducted a controlled eye-tracking experiment with 20 participants to compare five dictation interfaces: PLAIN (real-time transcription), AOC (periodic corrections), RAKE (keyword highlights), GP-TSM (grammar-preserving highlights), and SUMMARY (LLM-generated abstraction summary). The study analyzed participants' gaze patterns during their speech composition and reviewing processes. The findings show that during composition, participants spent only 7--11% of their time actively reading, and they favored real-time feedback and avoided distracting interface changes. During reviewing, although SUMMARY introduced unfamiliar words (requiring longer and more frequent fixation), they were easier to read (requiring fewer regressions). Participants preferred SUMMARY for the polished text that preserved fidelity to original meanings. RAKE guided the reading of self-produced text better than GP-TSM. These findings provide new ways to rethink the design of dictation interfaces.
... SR is therefore the target selection rate measuring the "compensated" activations per minute (capm), taking into account the speed (AT) and accuracy (ER) of sensor selection during the trial. Similar performance measures are commonly used for typing and dictation rates measured in characters per second or words per minute [4,15]. However, as participants of this study did not correct incorrect characters, performance is compensated with accuracy. ...
... While many of them reach typing performances over 30 cpm, most of them replace mouse pointer control and rely on onscreen keyboards to perform text entry tasks. Even though the ITCI does not perform as good as other assistive devices, one of the biggest advantages of the ITCI is the number of independent commands the user is able to transmit to the computer system due to the high number of available sensors (10)(11)(12)(13)(14)(15)(16)(17)(18). According to Wolpaw [9], a higher number of commands in the interface increase the information transfer rate of the system, and more importantly, raises the upper-bound performance of the system. ...
Article
Purpose: To investigate the effects of visual and tactile intra-oral sensor-position feedback for target selection tasks with the tip of the tongue. Method: Target selection tasks were performed using an inductive tongue-computer interface (ITCI). Visual feedback was established by highlighting the area on a visual display corresponding to the activated intra-oral target. Tactile feedback was established using a sensor-border matrix over the sensor plates of the ITCI, which provided sensor-position tactile queues via the user's tongue. Target selection tasks using an on-screen keyboard by controlling the mouse pointer with the ITCI's was also evaluated. Results: Mean target selection rates of 23, 5 and 15 activations per minute were obtained using visual, tactile and "none" feedback techniques in the 3rd training session. On-screen keyboard target selection tasks averaged 10 activations per minute in the 3rd training session. Involuntary activations while speaking or drinking were significantly reduced either through a sensor-matrix or dwell time for sensor activation. Conclusions: These results provide key design considerations to further increase the typing efficiency of tongue-computer interfaces for individuals with upper-limb mobility impairments.
... Si bien es posible argumentar que los valores de dB utilizados en la prueba podrían considerarse bajos (80 y 90dB ) y que pudieron favorecer los resultados del experimento, es útil mencionar que una persona atrapada entre escombros, puede experimentar ambientes de ruidos menores a los evaluados debido al claustro en el que se encuentra [17], Bajo este supuesto se realizaron pruebas con valores que van desde los 40dB hasta los 90dB sin encontrar mayor problema en la activación de la aplicación. Un punto a favor de la prueba pudo ser la utilización de un sintetizador de voz para el reconocimiento de comandos, en futuros trabajos se recomienda, sistemas de reconocimiento de voz basados en redes neuronales [18], lo que permitiría que el sistemas de reconocimiento de voz se ajuste a cada persona en particular, lo cual puede ser evaluado aprovechando los benchmarks que se han dispuesto para tal efecto [19][20]. ...
Article
Full-text available
El presente estudio realiza un análisis de los requisitos de diseño centrado en el usuario para una aplicación de mensajería instantánea cuyo enfoque sea la geolocalización de víctimas. El diseño consta de dos etapas, la primera etapa se concentró en la obtención de Guías e Insights basadas en testimonio de personas con algún grado de relación a las víctimas del terremoto del 16 de abril sucedido en Ecuador, la propuesta inicial sugería una aplicación a ser ejecutada con el mínimo esfuerzo del usuario en atención al posible ambiente que puede experimentar una víctima atrapada luego de un terremoto, las pruebas iniciales determinaron que los esfuerzos de diseño no fueron suficientes para cumplir con los objetivos propuestos y se añadió una segunda etapa con un diseño conceptual adicional, el cual contemplaba activación por comandos de voz; con ello, se pudo determinar que no existe relación entre la contextura física de la persona atrapada, ni el medio enclaustrado en el que pueda encontrarse, a diferencia de la seguridad con la que podría emitir los comandos de voz que determinaron la cantidad de veces que un comando deba repetirse antes de la ejecución de la aplicación. Abstract The present study performs an analysis of the requirements of user-centered design for an instant messaging application whose focus is the geolocation of victims. The design consists of two stages, the first stage focused on obtaining Guides and Insights based on testimony of people with some degree of relation to the victims of the earthquake of April 16 happened in Ecuador, the initial proposal suggested an application to be Executed with the minimum effort of the user in attention to the possible environment that a trapped victim can experience after an earthquake, the initial tests determined that the efforts of design were not enough to fulfill the objectives proposed and a second stage with a design was added Conceptual, which contemplated activation by voice commands; With this, it was possible to determine that there is no relation between the physical structure of the person trapped, nor the enclaustrado means in which it can be found, unlike the security with which it could emit the voice commands that determined the amount of times that A command must be repeated before executing the application.
... Under ideal conditions (lab conditions), users can input approximately 12 corrected words per minute (CWPM) using a virtual keyboard and fewer than 7 CWPM using a handwriting recognizer on a PDA (Commarford & Lewis, 2004). To demonstrate the advantage that this technique could provide for mobile device users, consider a user who wishes to enter the word, "subjective." ...
Technical Report
Full-text available
This report describes a method that provides users with an improved method for entering data into mobile handheld devices such as cell phones and personal digital assistants (PDAs) by increasing the likelihood of displaying a desired word in a word completion interface. This solution is unique in that it is sensitive to implicit rejection of the word completion candidates.
... The heyday of speech dictation systems seems to have passed, but given the difficulty of data input into handheld devices (Lewis, Potosnak, & Magyar, 1997) and the data input potential of dictation input into handheld devices (Commarford & Lewis, 2004), it is likely that speech dictation applications will re-emerge at some time in the future. An important aspect of speech dictation applications is the dictation accuracy. ...
Technical Report
Full-text available
Data from previous studies of dictation accuracy were analyzed to clarify the relationship between actual dictation accuracy and user perception of dictation accuracy. The analysis included data from studies of both discrete and continuous dictation with both male and female speakers. The main effect of type of dictation (discrete vs. continuous) was statistically significant, but the main effect of speaker gender and the interaction between type of dictation and speaker gender were not significant. Speakers’ estimates of the accuracy they experienced were consistently lower than the actual accuracy, with the magnitude of the difference greater for discrete than continuous dictation. Regression equations (one for each type of dictation) for predicting actual accuracy from estimated accuracy were both statistically significant. These findings have potential value for future evaluations of dictation accuracy.
... Research in the component tasks of producing text via recognition technologies based on speech and handwriting input has shown that in many cases the speed of making corrections can have a greater influence than recognition accuracy or input rate on the throughput of text production (Commarford & Lewis, 2004;Karat, Halverson, Horn, & Karat, 1999;Lewis, 1999). The analyses in this paper explore the properties of error spans -the number of consecutive incorrect words generated when producing text with recognition technologies. ...
Conference Paper
Full-text available
This paper describes a number of experiments and analyses conducted to gain an understanding of the typical properties of error spans - the number of consecutive incorrect words generated when producing text with recognition technologies. The data were drawn from three studies of recognition applications: one study of an IBM dictation system, one of a non-IBM dictation system, and one of an IBM handwriting recognition application. Across the three recognition studies, the appropriate width for a text field designed to display text selected for correction appeared to be the width that would accommodate the presentation of three words (or 17 characters). An analysis of the position of correct text in the alternates list for the IBM dictation system indicated that a correction control should ideally present four alternates (covering 97% of the cases for error spans of one word and 90% of cases for error spans of two words). There appears to be little reason to provide a control that displays more than four alternates for any type of recognition user interface.
Article
Full-text available
Purpose: To evaluate typing and pointing performance and improvement over time of four able-bodied participants using an intra-oral tongue-computer interface for computer control. Background: A physically disabled individual may lack the ability to efficiently control standard computer input devices. There have been several efforts to produce and evaluate interfaces that provide individuals with physical disabilities the possibility to control personal computers. Method: Training with the intra-oral tongue-computer interface was performed by playing games over 18 sessions. Skill improvement was measured through typing and pointing exercises at the end of each training session. Results: Typing throughput improved from averages of 2.36 to 5.43 correct words per minute. Pointing throughput improved from averages of 0.47 to 0.85 bits/s. Target tracking performance, measured as relative time on target, improved from averages of 36% to 47%. Path following throughput improved from averages of 0.31 to 0.83 bits/s and decreased to 0.53 bits/s with more difficult tasks. Conclusions: Learning curves support the notion that the tongue can rapidly learn novel motor tasks. Typing and pointing performance of the tongue-computer interface is comparable to performances of other proficient assistive devices, which makes the tongue a feasible input organ for computer control. Implications for rehabilitation: Intra-oral computer interfaces could provide individuals with severe upper-limb mobility impairments the opportunity to control computers and automatic equipment. Typing and pointing performance of the tongue-computer interface is comparable to performances of other proficient assistive devices, but does not cause fatigue easily and might be invisible to other people, which is highly prioritized by assistive device users. Combination of visual and auditory feedback is vital for a good performance of an intra-oral computer interface and helps to reduce involuntary or erroneous activations.
Chapter
Full-text available
From PDAs to cell phones to MP3 players, handheld electronic devices are ubiquitous. Human factors engineers and designers have a need to remain informed about advances in research on user interface design for this class of devices. This review provides human factors research summaries and research-based guidelines for the design of handheld devices. The major topics include anthropometry (fitting the device to the hand), input (types of device control and methods for data entry), output (display design), interaction design (one-handed use, scrolling, menu design, image manipulation, and using the mobile Web), and data sharing (among users, devices, and networks). Thus, this review covers the key aspects of the design of handheld devices, from the design of the physical form of the device through its hardware and software, including its behavior in networks.
Conference Paper
Energy efficiency has become a critical issue for battery-driven computers. Significant work has been devoted to improving it through better software and hardware. However, the human factors and user interfaces have often been ignored. Realizing their extreme importance, we devote this work to a comprehensive treatment of their role in determining and improving energy efficiency. We analyze the minimal energy requirements and overheads imposed by known human sensory/speed limits. We then characterize energy efficiency for state-of-the-art interfaces available on two commercial handheld computers. Based on the characterization, we offer a comparative study for them.Even with a perfect user interface, computers will still spend most of their time and energy waiting for user responses due to an increasingly large speed gap between users and computers in their interactions. Such a speed gap leads to a bottleneck in system energy efficiency. We propose a low-power low-cost cache device, to which the host computer can outsource simple tasks, as an interface solution to overcome the bottleneck. We discuss the design and prototype implementation of a low-power wireless wrist-watch for use as a cache device for interfacing.With this work, we wish to engender more interest in the mobile system design community in investigating the impact of user interfaces on system energy efficiency and to harvest the opportunities thus exposed.
Conference Paper
Full-text available
The eight participants in this experiment used two different commercially available speech recognition dictation systems to complete a variety of reading transcription tasks. Participants enrolled fully in both systems. They received training in two correction strategies for both systems: multimodal correction (voice plus mouse plus keyboard) and hands-free correction (voice-only), and used both strategies during the experiment. The key findings were: • Both dictation systems were equally accurate. • Throughput (corrected words per minute) was significantly (63%) faster using multimodal correction. • Speaking rates were the same for both systems and correction strategies, averaging around 105-110 utterances (words and commands) per minute. • Correction speeds for the multimodal correction strategy (13.2 seconds per correction) were significantly faster than (a little more than twice as fast as) those for hands-free correction (29.1 seconds per correction). • At the end of the experiment, participants indicated they significantly preferred the multimodal correction strategy.
Technical Report
Full-text available
After decades of research, low-cost (relatively) speech recognition products featuring continuous speech recognition burst on the scene in the late 1990s. Competition was fierce among IBM, Dragon, L&H, and Philips. A key competitive measure was the products’ reported dictation accuracies. This report documents the reported accuracies of the products of these four companies from 1997 through 1999 – the period of time in which these products received significant press. At the beginning, Dragon appeared to be the most accurate program, but by the end of 1999, IBM and Dragon appeared to be equal (and ahead of L&H and Philips).
Article
Two questions that computer keyboard operators face when using keyboards that can be separated into halves (split keyboards) are at what angle should the keyboard halves be opened and at what distance should the keyboard halves be placed apart. The objective of this study was to investigate the effects of the opening angle and separation distance between halves of a split keyboard on wrist radial/ulnar deviation and typing efficiency. Eleven experienced typists participated in this study and typed on a split keyboard configured in the following four arrangements. 1. The keyboard halves were setup the same way as a conventional keyboard. 2. The keyboard halves were contiguous but angled, based on the user's anthropometry, to maintain a theoretical neutral posture of the user's wrists in the radial/ulnar plane. 3. The keyboard halves were separated at a fixed distance of 20 cm, and the halves were angled to maintain a theoretical neutral posture of the user's wrists in the radial/ulnar plane. 4. The keyboard halves were separated at a distance equal to the user's shoulder width, and the halves were parallel to each other, resulting in a theoretical neutral posture of the user's wrists in the radial/ulnar plane. The findings from testing these four keyboard configurations are the following: 1. The mean ulnar deviations from the alternative configurations of the split keyboard (configurations 2, 3, and 4 above) ranged from 7.0 to 8.4 for the left wrist and 2.7 to 5 deg. for the right wrist. There were no significant differences in ulnar deviations among the three alternative configurations. 2. The three alternative configurations resulted in ulnar deviation of both wrists that were significantly less than ulnar deviation from typing on the conventional setup (configuration 1 above). The mean ulnar deviations from the conventional setup were 18.9 deg. for the left wrist and 14.2 deg. for the right wrist. 3. There were no significant differences in typing speed and accuracy between the alternative and conventional configurations.
Article
Text entry rates are explored for several variations of soft keyboards. We present a model to predict novice and expertentryrates and presentan empiricaltestwith24 subjects. Six keyboards were examined: the Qwerty, ABC, Dvorak, Fitaly, JustType, and telephone. At 8± 10 wpm, novice predictionsare low for all layouts because the dominantfactor is the visual scan time, rather than the movement time. Expert predictionsare in the range of 22± 56 wpm, although thesewere not tested empirically. In a quick, novice test with a representative phrase of text, subjects achieved rates of 20.2 wpm (Qwerty), 10.7 wpm (ABC), 8.5 wpm (Dvorak), 8.0 wpm (Fitaly), 7.0 wpm (JustType), and 8.0 wpm (tele- phone). The Qwerty rate of 20.2 wpm is consistent with observations in other studies. The relatively high rate for Qwerty suggests that there is skill transfer from users' familiarity with desktop computersto the stylus tapping task.
Article
Methods for entering text on pen-based computers were compared with respect to speed, accuracy, and user preference. Fifteen subjects entered text on a digitizing display tablet using three methods: hand printing, QWERTY-tapping, and ABC-tapping. The tapping methods used display-based keyboards, one with a QWERTY layout, the other with two alphabetic rows of 13 characters. ABC-tapping had the lowest error rate (0.6%) but was the slowest entry method (12.9 wpm). It was also the least preferred input method. The QWERTY-tapping condition was the most preferred, the fastest (22.9 wpm), and had a low error rate (1.1%). Although subjects also liked hand printing, it was 41% slower than QWERTY-tapping and had a very high error rate (8.1%). The results suggest that character recognition on pen-based computers must improve to attract walk-up users, and that alternatives such as tapping on a QWERTY soft keyboard are effective input methods.
Article
A theoretical model is presented to predict upper-and lower-bound text-entry rates using a stylus to tap on a soft QWERTY keyboard. The model is based on the Hick-Hyman law for choice reaction time, Fitts law for rapid aimed movements, and linguistic tables for the relative frequencies of letter-pairs, or digrams, in common English. The model's importance lies not only in the predictions provided, but in its characterization of text-entry tasks using keyboards. Whereas previous studies only use frequency probabilities of the 26 × 26 digrams in the Roman alphabet, our model accommodates the space har—the most common character in typing tasks. Using a very large linguistic table that decomposes digrams by position-within-words, we established start-of-word (space-letter) and end-of-word (letter-space) probabilities and worked from a 27 × 27 digram table. The model predicts a typing rate of 8.9wpm for novices unfamiliar with the QWERTY keyboard, and 30.1wpm for experts. Comparisons are drawn with empirical studies using a stylus and other forms of text entry.