Conference PaperPDF Available

SpokeIt: building a mobile speech therapy experience


Abstract and Figures

SpokeIt is a mobile serious game for health designed to support speech articulation therapy. Here, we present SpokeIt as well as 2 preceding speech therapy prototypes we built, all of which use a novel offline critical speech recognition system capable of providing feedback in real-time. We detail key design motivations behind each of them and report on their potential to help adults with speech impairment co-occurring with developmental disabilities. We conducted a qualitative within-subject comparative study on 5 adults within this target group, who played all 3 prototypes. This study yielded refined functional requirements based on user feedback, relevant reward systems to implement based on user interest, and insights on the preferred hybrid game structure, which can be useful to others designing mobile games for speech articulation therapy for a similar target group.
Content may be subject to copyright.
SpokeIt: Building a Mobile Speech Therapy Experience
Jared Duval, Zachary Rubin, Elena Márquez Segura, Natalie Friedman, Milla Zlatanov, Louise
Yang, Sri Kurniawan
University of California Santa Cruz, Santa Cruz, US
SpokeIt is a mobile serious game for health designed to
support speech articulation therapy. Here, we present
SpokeIt as well as 2 preceding speech therapy prototypes we
built, all of which use a novel offline critical speech
recognition system capable of providing feedback in real-
time. We detail key design motivations behind each of them
and report on their potential to help adults with speech
impairment co-occurring with developmental disabilities.
We conducted a qualitative within-subject comparative study
on 5 adults within this target group, who played all 3
prototypes. This study yielded refined functional
requirements based on user feedback, relevant reward
systems to implement based on user interest, and insights on
the preferred hybrid game structure, which can be useful to
others designing mobile games for speech articulation
therapy for a similar target group.
Author Keywords
Speech Therapy; User-centered design; Serious Game for
Health; SpokeIt; Developmental Disabilities
ACM Classification Keywords
H.5. Information interfaces and presentation (e.g., HCI);
H.5.2. Voice I/O; I/O J.3. Life and Medical Sciences: Health
Speech is a crucial skill for effective communication,
expression, and sense of self-efficacy. Speech impairments
often co-occur with developmental disabilities such as
Autism Spectrum Disorder [44], Cerebral Palsy [9], and
Down Syndrome [8]. The prevalence of speech impairments
in individuals with developmental disabilities has been as
high as 51% [38]. Each of these developmental disabilities
exhibit symptoms of an articulation disorder. An articulation
disorder is categorized as having difficulty producing speech
sounds that constitute the fundamental components of a
language [54]. Many individuals with speech impairments
experience depression, social isolation, and a lower quality
of life [27]. Speech problems can negatively impact a
person’s employment status [30], and their ability to receive
proper healthcare. This includes receiving wrong diagnosis,
inappropriate medication, and access to service [33]. The rate
of arrests and convictions was higher for boys with language
impairments [5]. In 2012, approximately 10% of the U.S.
adult population experienced a speech, language, or voice
problem [30].
Speech is a skill that can often be improved with
individualized therapy and practice [39,45]. Access to
Speech Language Pathologists (SLPs) is crucial to
improving speech, but up to 70% of SLPs have waiting lists
indicating a shortage in the workforce and disrupted access
to therapy [47]. As a result, many non-professional therapists
are being trained by SLPs to deliver speech therapy outside
of the office [24,51]. This is not an ideal situation because
the SLP must take the time to train the non-professional
speech therapy facilitator, the individual’s therapy schedule
then relies on the facilitator’s schedule, and these facilitators
may not be as effective at delivering a speech curriculum
[24]. Even worse, many untrained facilitators attempt to
deliver speech curriculums reporting a general low sense of
competence in assisting people with disabilities in their
assigned curriculums [33].
Mobile speech therapy games could help people practice
articulation anywhere without the need to be facilitated,
which may potentially expedite their speech therapy progress
[40]. The pervasiveness of mobile hardware makes it an ideal
platform for delivering speech therapy to those who may not
have access to a speech therapist or a facilitator. Many SLPs
design games and activities to engage their clients. Games
and play have been widely recognized as a valid motivator
for otherwise jaded individuals [4]. We expect there are
many benefits to using a mobile speech therapy game,
including the ability to practice anywhere, collect fine-
grained speech data, track the frequency and time individuals
spend practicing, track performance over time, and create
dynamic, custom therapies to each individual. This has
presumably motivated the appearance of many mobile
speech therapy apps with different features and function.
Yet, they tend to require a facilitator to evaluate speech.
Speech recognition has been successfully used to facilitate
speech therapy [2,3,6,10,31,32,46,49,52], but not in a mobile
context focusing on articulation.
In this paper, we describe key implementation details of the
underlying mobile offline real-time critical speech
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work o
wned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
MobileHCI '18,
September 36, 2018, Barcelona, Spain
© 2018 Association for Computing Machinery.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
recognition system. We present the designs of 2 articulation
therapy game prototypes that culminated in the creation of
SpokeIt. We share results from our comparative within-
subject study, which included 5 adults with disabilities co-
occurring with speech impairment who played each of the 3
designs. Finally, we discuss lessons learned about these
designs, reward systems our participants were interested in,
and promising future work.
We use this section to a) introduce the target group in our
studies and discuss the type of articulation therapy that is
generally appropriate, b) motivate our choice to build a
serious game for health for speech therapy, and c) argue that
a gap exists for mobile articulation therapy games utilizing
critical speech recognition systems.
Articulation Disorders Co-occurring with Developmental
Adults with developmental disabilities co-occurring with
speech impairment would benefit from speech therapy [38].
There is a trend towards helping children with speech
impairments [6,8,12,41,43,49,51], but adults with speech
impairments need support as well [44,45]. This support can
come in the form of a speech recognition system, discussed
Autism Spectrum Disorder (ASD)
ASD is one of the most common developmental disabilities,
affecting approximately 400,000 individuals in the United
States [55]. A follow-up study was conducted on children
with ASD and communication problems when they reached
early adulthood showing that the group continued to show
significant stereotyped behavior patterns, problems in
relationships, troubles with employment, and lack of
independence [17]. A person with ASD may have monotonic
(machine-like) intonation, deficits in pitch, vocal quality,
volume, and articulation distortion errors [44].
Cerebral Palsy
A person with Cerebral Palsy and dysarthria (difficult or
unclear articulation of speech that is otherwise linguistically
normal) may include anterior lingual place (front of the
tongue) inaccuracy, reduced precision of fricative
(consonant requiring breath through a small opening) and
affricative (Plosive followed by fricative sound like j as in
jam) manners, slowness of speech, and indistinctness of
speech [9,35].
Down Syndrome
Many people with Down Syndrome have muscle hypotonia
[8]. Muscle hypotonia may cause abnormal movement of the
cheek, lips, and tongue, resulting in articulation errors. Many
people with Down Syndrome also speak at a fast and
fluctuating rate [48] known as cluttering [16].
Speech Recognition to Support Speech Therapy
Each of the aforementioned developmental disabilities have
one general symptom in commonan articulation disorder.
For this reason, we focused on the development of a critical
speech recognition system capable of distinguishing between
correct and incorrect pronunciations. We discuss these
implementation details thoroughly in the Creating a Novel
Speech Mechanic section. There are many more technologies
that focus on improving speech skills such as fluency, pitch,
rhythm, dialect, and aphasia [3,6,31,31,46,49,52], but we do
not focus on these.
Serious Games for Health
Games can be used as effective educational interventions
[1,28]. Games have the ability to teach while providing a
motivating and interactive environment [50], and can be as
effective as face-to-face instruction [37]. They can create
opportunities to train new skills in a safe and engaging
environment, improving perceived self-efficacy, a key aspect
in health promotion interventions [22]
Multiple serious games for health are documented to be
effective for diverse platforms, health outcomes, and target
populations. Some examples that populate this space range
from an exergame to help blind children with balance [29] to
embodied persuasive games for adults in wheelchair [15] to
mobile games for motivating tobacco-free life in early
adolescence [34].
Games are such a powerful motivator that non-game
activities are often designed to look like games (i.e. gamified
systems). This attracts user attention and entices engagement
[11], which is particularly useful for tedious and otherwise
non-interesting activities [7]. By adding game design
elements to a pre-existing activity, designers manage to
engage people more with this activity [23]. However,
traditional gamification approaches have been widely
criticized [25]. Ethics apart, simply adding superficial game-
looking elements to an otherwise tedious activity does not
work in the long run. Both Nicholson and McGonigal [25,26]
have pointed out that extrinsic rewards like those typically
used in gamification can decrease intrinsic motivation and
engagement after the initial novelty effect. McGonigal
suggests a more fruitful approach by making the activity
intrinsically motivating and the rewards meaningful to the
player [25]. These relevant considerations prompted us to
study the types of rewards that interest our players.
Speech Therapy Applications
We present a comparison of some existing speech therapy
systems, shown in Figure 1. The survey of speech therapy
applications we present was compiled from interviews with
3 speech language pathologists, a literature review, and
current offerings on the App Store. The features we focus on
are whether the solutions run on mobile hardware and
whether or not they use speech recognition. We choose these
features to illustrate the gap SpokeIt is intended to fill.
Running on mobile hardware is important because of the
pervasiveness and convenience of mobile hardware [36]. A
critical speech recognition system is important for a speech
therapy game because it removes the requirement to be
facilitated by an often unreliable third party, opens up
opportunities to record fine-grained speech data, track
progress, and listen for both correct and incorrect
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
Of all the applications surveyed, we include Sayin’ it Sam,
the only other mobile game with an identical underlying
speech recognition library, to illustrate that similar
technology can serve very different purposes based on the
target population. We found no mobile games that use a
critical speech recognition system for articulation therapy.
We include non-mobile applications with speech recognition
systems because they offer novel speech therapy experiences
and features we wish were available in a mobile context. We
include a set of mobile speech therapy games without speech
recognition systems because they have many noteworthy
features and designswhich could be further improved if
they implemented a critical speech recognition system.
Figure 1: Venn Diagram depicting dichotomy between therapy
systems that use speech recognition and those that run on
mobile hardware
Sayin’ It Sam is the only other mobile application featuring
a speech recognition system identical to the library SpokeIt’s
critical recognition is built on. Sayinit Sam is primarily
focused on motivating non-verbal children to speak and is
therefore trained to be very forgiving. SpokeIt, however, is
built to listen critically to speech and to be used as an
articulation therapy tool.
Other researchers have integrated speech recognition into
non-mobile interactive game environments to improve
literacy. Project LISTEN is an automated reading tutor
aimed at helping children learn pronunciation and proper
speech when reading aloud. It does this by analyzing various
aspects such as pitch, speed, and pauses. Researchers have
tested the system in India for assisting children learning
English as a second language, as well as in Canada for
children looking to improve their speaking skills [31].
Project ALEX is a non-mobile application that has proposed
a very robust application for language learners of any age.
Project ALEX focuses on a large dictionary with text-to-
speech functionality. Most importantly, Project ALEX
included pronunciation practice and used speech recognition
to check if the user says the work correctly using Microsoft
SAPI [32]. Project ALEX is focused on studying the cultural
differences in speech and dialect.
Articulate it! is a unique multi-player mobile application
created by a SLP specifically to help children improve their
speech sound production. Articulate it! employs over 1000
images selected for working on English consonant sounds at
the word and phrase level and has the ability to store data for
multiple patients. Articulate It! has multiple game modes
such as a phonemes mode where the facilitator can select
target sounds and a mode where the facilitator can focus on
words with a specified number of syllables. Once a mode is
selected, the facilitator has the option to customize the
dictionary of target words by removing unwanted ones.
Modes can be switched without ending a session and speech
can be recorded for later comparisons and for the player to
listen back to their speech. This app requires a facilitator to
score speech.
Articulation Station [43] is a novel mobile speech therapy
app that allows SLPs to customize target sounds and sound
placement for patients of all ages to practice. For example,
SLPs can make the app focus on /k/ sounds that occur at the
beginning, middle, or end of a word, such as cat, pickle, or
tick. The app has three levels of difficulty where users must
say target words, sentences, or full stories. Like Articulate
it!, Articulation Station requires a SLP or facilitator. Here,
they grade speech, which is recorded in the app and available
for reporting. The app has pre-recorded samples of correct
speech for all target words.
Articulation Games is a comprehensive, flexible, and fun
speech-therapy app for iPads that was created by a certified
speech and language pathologist for children to practice the
pronunciation of over forty English phonemes, organized
according to placement of articulation. It includes thousands
of real-life flashcards, accompanied by professional audio
recordings and ability to record audio. Players practice
phonemes through activities like memory games and
flashcards. Articulation Games requires a facilitator to grade
speech. Auditory Workout, Articulation Vacation, and Real
Vocabulary all offer similar features and experiences as
Articulations Games, Articulation Station, and Articulate It!.
Many apps have more features than what is presented, such
as data collection capabilities for progress reports, the ability
to record player’s voices so they can listen back, and
challenges that progress from single sounds and words to
sentences and finally free natural speech. In the full release
of SpokeIt, we plan to include many of these features.
We started our work by conducting 3 semi-structured
interviews with medical speech experts, which lead to the
creation of our functional requirements. We used these
functional requirements to inform the development of our
speech recognition system. Following an iterative user-
centered design process, we designed and implemented 3
prototypes that used this speech recognition system. Each of
the designs explored a unique game format and core speech
mechanic, which we detail and motivate in the following
These prototypes were developed in sequence, so design
knowledge carried over from one prototype to the next.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
We conducted a within-subject comparative study on the 3
prototypes with 5 adults with developmental disabilities co-
occurring with speech impairment. We were concerned with
participantsinterest in using a speech therapy game to
improve their speech, their opinion on each of the 3
prototypes, their preferred game structure, and relevant
reward systems they would be interested in.
We present here the main insights from this study, which
directly influenced the further development of SpokeIt. We
think they can be useful for others designing mobile games
for speech articulation therapy for a similar target group.
Creating a Novel Speech Mechanic
Our work began with semi-structured interviews with
medical speech experts. Researchers asked what technology
was currently used for speech therapy, what benefits and
drawbacks they see to using technology for speech therapy,
and what functionality must exist. The only technology our
experts used during speech therapy sessions where iPads
displaying images of speech targets for diagnostic purposes.
Experts suspected that their patients were practicing little to
none outside of the office, even though they recommended
10 minutes of practice per day. They were hopeful that a
speech therapy game would motivate their patients to
practice outside of the office. They stipulated that the system
must critically listen to pronunciation and were mildly
worried that a speech therapy game could condition bad
speech practices if the recognition was not accurate. The
inclusion of examples of correct speech mitigated thee
concerns. Finally, they expressed concerns that many of their
patients from lower socio-economic statuses may not have
access to the internet, so our game must be functional offline.
We budgeted for iPads in our grant so that these populations
can keep the iPads with the software installed after the
development and evaluation of SpokeIt has ended. We chose
iPads because they are categorized as medical devices by the
United States Government.
While there are many novel and interesting mobile
applications available that can improve the speech therapy
experience, we found none that provided a critical speech
recognition system in a game for on-the-go or at-home use.
Before developing any game prototypes, it was necessary to
ensure that it was feasible to listen for both correct and
incorrect speech. Many speech recognition systems exist
today such as those used in personal assistants like Cortana,
Siri, Google Assistant, and Amazon Alexa. We chose not to
use these services for multiple reasons:
We needed a solution that is offline because we are dealing
with sensitive speech data. In addition, not every home has
access to the internet. Also, online speech recognition
systems often have lag and usage caps that would hinder
real-time game play.
Digital assistants like the ones listed above are designed to
best guess speech, not listen to it critically. We needed to
be able to fine-tune the recognition to listen for incorrect
speech as well as correct utterances.
We did not want to discard the possibility of using and
recognizing non-existent words. Having the freedom to
play with silly nonsensical words is a tactic many SLPs use
to target specific sound production.
With these requirements in mind, we began searching online
for mobile speech recognition libraries that are highly
customizable and do not require an internet connection. The
library we chose is Pocketsphinx, an offline speech
recognition system for handheld devices from Carnegie
Mellon [19]. A speech therapy game must be able to listen to
speech critically so that the intervention will promote correct
speech. Pocketsphinx uses customizable dictionaries that
allow developers to customize the targets that can be
recognized [18] The dictionaries that Pocketsphinx employs
use ARPAbet, a set phonetic transcription codes, to map
speech sounds to English words [14]. ARPAbet can be used
to construct any sequence of phonetic sounds to a word
even words that do not exist. Any set of sounds that an
English speaker can produce can be mapped to an ARPAbet
representation. We can make new “words” that map to
common miss-pronunciations of correct words. Providing
both correct ARPAbet codes and ARPAbet codes that
represent miss-pronunciations give us the power to
distinguish between correct and incorrect speech. Table 1
shows ARPAbet codes that represent both correct and
incorrect ways to say the word balloon.
Common pronunciations of “Balloon” ARPAbet Code
Table 1: Pronunciations of the word "Balloon" including
correct pronunciation (first) followed by common miss-
pronunciations and their corresponding ARPAbet Code
Pocketsphinx uses acoustic models to map sound data to
targets in the dictionary. These acoustic models are hot-
swappable and can be altered for better accuracy [18]. This
feature creates the potential to alter acoustic models for
specific populations, allowing a more accurate model that
can listen to adults with developmental disabilities, or even
one specifically for children with cleft speech. OpenEars is
a free open-source framework that brings the power of
Pocketsphinx to iOS devices in native objective-c language
for speed and reliability. RapidEars is a paid plugin for
OpenEars that gives Pocketsphinx the ability to listen to
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
speech in real-time, which is important for a responsive
game. The ability to customize acoustic models, to customize
dictionaries, to run offline, and to listen in real-time
motivated our choice to use RapidEars for our speech
therapy game prototypes.
Design and Development of prototypes
Our primary target population for the final release of SpokeIt
was children, and hence the 3 prototypes were designed with
children in mind. However, our primary IRB and physical
location limited our access to children with speech
impairments at the time. For accessibility reasons, we tested
our prototypes on adults with developmental disabilities co-
occurring with speech impairmentwho can also benefit
from the kinds of games we were creating. This could be seen
as a limitation of our study, yet as we will further discuss
later, the acceptance and enjoyment of our adult participants
towards our designs indicated they had good potential to
engage population older than our original target group.
To initiate the design process and after interviews with a
SLP, we drew inspiration from media that successfully elicits
speech from children, such as the popular children’s program
Dora the Explorer. The program inspired us because it
integrates learning and vocal participation in a storybook
style setting with intermittent learning challenges that
children love.
Speech Adventure
The first prototype that was developed was Speech
Adventure [42]. Speech Adventure, shown in Figure 2, is a
storybook style game that employs an off-screen narrator to
give directives on how to help Sam the Slug complete tasks.
Visual cues in the form of glowing blue outlines inform
players which parts of the scene can be interacted with and
touched. Once touched, the corresponding target phrase is
announced by the off-screen narrator.
To make progress in the game, the player must repeat the
target phrase that was just announced. The target words are
displayed at the bottom of the screen and are tightly tied to
the speech recognition system. Words turn green as they are
said correctly. To make progress, all words must be green.
Words can be said out of order and words that were missed
anywhere in the phrase can be repeated.
The green ear in the upper left corner of the screen indicates
that the system is actively listening. When the off-screen
narrator is speaking, the green ear turns white to signal the
system is not listening. We found it important to suspend
recognition during these moments so that the in-game audio
didn’t trigger game events.
The story of Speech Adventure starts with dressing Sam in
boots and a hat. Once Sam is dressed, the player must say
“Open the door” to journey outside. Once outside the player
must pop 3 balloons that are blocking a bridge by saying
“Pop a balloon” before continuing on the journey. These
phrases were provided by SLPs.
Figure 2: Speech Adventure cabin scene where player must
help Sam the Slug get dressed before going outside
A challenging design aspect of this game was that each target
phrase had to be carefully crafted to fit into the narrative of
the game. Development of new scenes that incorporated
target words proved to be very time consuming, which could
result in minimal content. We worried that Speech
Adventure would lose its novelty after a first play-through.
From conversations with the SLP, we realized re-playability
was an important feature for a game that should be played
for 10 minutes per day.
The majority of the allotted 10 minutes of daily practice
should be spent producing target words and practicing
speech. Yet, the narrative nature of our storybook-style
design limited the number of utterances that could be
produced in 10 minutes, which then caused concerns within
the design team about the therapeutic value of our game.
The nature of hand-crafted narratives limited our ability to
dynamically swap target words. According to a SLP we
interviewed, a speech therapy game would benefit from the
ability to customize targets dynamically based on the types
of speech therapy each individual player needs.
Although preliminary play tests indicated that players love
the storybook-style Speech Adventure game [40], the
considerations above prompted us to rethink how a speech
therapy game should be structured.
Speech with Sam
To maximize the number of target words produced in a 10-
minute period, we hypothesized that single-word utterances
that had an immediate effect on gameplay would yield more
speaking time. Removing narrative between utterances
reduces the amount of time that the recognition system is
suspended. This resulted in the development of Speech with
Sam, a series of speech controlled mini-games, shown in
Figure 3. The mini-game structure allowed quicker
development time, which yielded more content to play.
Because targets are not directly linked to a narrative,
implementing a diverse range of dynamic targets would be
more feasible.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
Figure 3: Speech with Sam rocket mini-game where rockets
are set off by saying the appropriate target
Speech with Sam played a series of mini-games ordered
randomly for a specified amount of time. In the rocket mini-
game example above, the player must tap one of the three
rockets. Touching a rocket reveals an in-game prompt that
specifies a rockets trigger word. Once the trigger word is
said, the rocket blasts off the screen and is replaced by a new
rocket with a new random trigger word.
For every rocket launched, the player score is increased by 1
and displayed in the green rounded rectangle in the upper-
left corner of the display. This score is recorded at the end of
every mini-game to keep a record of high scores and to track
trends about player scores.
The number in the blue rounded rectangle, next to the score
is a countdown timer until the next mini-game is played.
After the timer reaches 0, the score is recorded, and the
player is presented with a different randomly chosen mini-
game. All mini-games run for a standard amount of time. It
is possible for players to play the same mini-game in one
session, but not twice in a row.
The green ear in the upper-right corner of the screen works
identical to Speech Adventure in that when it is green, the
speech system is listening, and when white, the speech
system is suspended so that in-game audio does not trigger
events. The written prompt was moved to the top to join the
other heads up display elements. When players say the target
word, the text turns green and the game immediately
responds. In Speech Adventure, targets were often multi-
word phrases, whereas in Speech with Sam, the targets are
single words.
In a preliminary study [40], we found that Speech with Sam
was successful in increasing the words per minute from
participants, meaning Speech with Sam has the potential to
be a more effective speech therapy solution. In that same
study, we found that participants were unenthusiastic when
presented with a mini-game they had played before. Our
hypothesis about mini-game re-playability being high was
not necessarily true.
SpokeIt, shown in Figure 4, is storybook-style mini-game
hybrid. By adding a story around the mini-games that fit into
an overarching plot, we have the potential to both produce
high output of words per minute and keep users engaged.
Figure 4: SpokeIt card coloring mini-game
SpokeIt is the first speech therapy game to both demonstrate
correct pronunciation with pre-recorded audio and lip
animations. One of the medical experts we interviewed later
suggested that showing correct speech is important as well
as hearing correct speech. We did not want to break
immersion by displaying a realistic mouth in an animated 2D
environment. We found a solution that allowed us to sync the
audio that exemplified correct speech with animated mouth
transitions on our characters. The lip animation effects were
achieved using Adobe Character Animator. Each phoneme
mouth shape was crafted in Adobe Photoshop with three
frames per transition, resulting in smooth transitions between
mouth shapes. Adobe Character Animator’s Lip Sync feature
and our frame transitions automatically map our voice
actor’s speech performances to the appropriate mouth
shapes. The motivation behind this work is to give players
visual cues on how a word is said. Adobe Character
Animator’s abilities to synchronize lip animations and
replicate actor’s facial expressions is shown in Figure 5.
The SpokeIt prototype, shown in Figure 4, is meant to blend
the positive aspects of the Speech Adventure and Speech
with Sam experiences. Therefore, the beginning of the game
starts with a narration from Sam the Slug and their desires to
go and visit a close friend. To get to the friend, the player
must partake on a long journey filled with new experiences
and challenges. In the example above, Sam meets a friendly
creature named Red who is struggling to learn colors that
start with the letter “B”or in the future, colors that start
with any letter that player needs to practice. Sam knows the
player is great with colors and asks them to teach Red colors
that start with “B” by saying the color on numbered cards.
Unlike Speech with Sam and Speech Adventure, an on-
screen character narrates the game. SpokeIt is the first
prototype to include completely animated characters with
mouth transitions. Unlike Speech with Sam, but similar to
Speech Adventure, SpokeIt demonstrates how each target
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
should be pronounced. SpokeIt is the only prototype that
automatically moves on if a player is struggling for over ten
Figure 5. Top: Character demonstrating /V/ sound, 2nd from
Top: Character demonstrating /L/ Sound, Bottom: Characters
showing sad and disgusted expressions
Instead of words lighting up green as they are spoken
correctly, SpokeIt uses a “Heard” element that displays
exactly what speech was recognizedcorrect
pronunciations and incorrect pronunciations. The target
word or phrase is displayed in the upper-left corner of the
screen. SpokeIt has word, phrase, and sentence targets. The
ear in the upper-right is crossed out when the system pauses.
Unlike both predecessors, SpokeIt is completely touch-free.
To simplify game directives and required interactions, the
only form of input that makes progress in the game is speech.
We use the touch-screen to aid players. When an element is
tapped, Sam says that word aloud to help players know their
targets, which is important to players who cannot read.
Design Overview
Table 2 summarizes some key differences between the three
prototypes we developed, namely the game structure style
and how the players are prompted on game targets. We
hypothesized players would enjoy the hybrid mini-game
storybook style SpokeIt provides because it includes fast
paced game play surrounded by narrative. We also
hypothesized a main character who demonstrates speech
with mouth animations would increase usability and
therapeutic value.
Game Style
Speech Adventure
Off-Screen Narrator
Speech with Sam
Text prompts
Main character
Table 2: Key characteristics of each prototype
After development of the third prototype had been
completed, researchers wanted to ensure progress on the
design was moving in the correct direction. We wanted to
ensure our game was usable and learn what future features or
rewards would keep players engaged.
We began by administering a preliminary survey to collect
demographics, interest in speech therapy games, and general
game use.
We then conducted a within-subject comparative study
where each participant played each of the three prototypes in
a random order. Researchers were present to facilitate,
answer questions, and change prototypes when necessary.
Following, a post interview was conducted, where we
collected rankings of each of the designs, usability feedback,
and core mechanic feedback, asking a mixed set of targeted
questions to explore positive and negative characteristics
from each of the three prototypes. We also discussed the
kinds of reward systems our participants would be interested
in. Each participant was asked the same set of questions.
Facilitators wrote down answers to each of these questions
and also jotted down any quotes or observations they had.
Our study was video recorded.
We concluded the study with a 3-question 5-point Likert
survey to receive feedback on how well-received our speech
recognition system and speech mechanics were. We were
interested in its perceived accuracy, responsiveness, and
mechanic. Participants were asked if 1) the game accurately
heard them, 2) The game responded quickly to speech, and
3) They would want to play at home.
Our research lab has an on-going relationship with a local
day program for adults with developmental disabilities. We
asked the program staff to provide us with facilities to
conduct our study and to provide us with participants with
speech impairments who are legally able to provide consent.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
Regarding the demographic information collected, all
individuals who attend this day program are adults. Two
participants were not comfortable sharing their age but
seemed to be similar in age to the other participants. Table 3
below outlines basic demographic information of our 5
participants. One participant had Cerebral Palsy, one had
Down Syndrome, one had ASD, and two were diagnosed
with mental retardation co-occurring with articulation
Game Play Frequency
5 Days/Week
2 Days/Week
7 Days/Week
7 Days/Week
2 Days/Week
Table 3: Participant Demographics
We were allowed to use two medium rooms. Because neither
of the two rooms was big enough to accommodate the entire
group and we had to work within the daily schedule, we split
participants between the two rooms to run the study in
parallel with all participants. The situation was not ideal, but
to remove as much bias as possible, we asked each question
to each individual in a random ordermeaning P3 might
answer question 1 first, then P2, but P1 answered question 2
first followed by P3. Each participant had the opportunity to
answer each question before the group moved on to the next
question. The facility was also not ideal for the within-
subject play of the games. The speech recognition works best
in a quiet environment, but this was not possible given the
constraints of our facility.
We brought enough iPads for each participant and a few
extra in case of technical problems, two laptops with
webcams to record each of the rooms, surveys, scripts,
consent forms, and note-taking materials.
We use handwritten notes from researchers containing
participant’s responses to questions about relevant reward
systems and opinions on each of the 3 prototypes. We use the
results of our 3-question Likert survey and participant quotes
about the speech recognition to report insights about its use.
We use video recordings of the participants playing our 3
game prototypes to identify usability issues, and player
For our analysis, two researchers created codes and themes
while three independent coders analyzed all videos for our
qualitative analysis. We use BORIS to analyze the videos
using our codes and themes. Three researchers independently
analyzed all videos using our BORIS file. The emerging
insights include preferred game styles, reward systems, and
usability concerns.
All participants play games two or more times per week and
would be interested in using games to improve their speech.
Participants report that the games they play most commonly
are car games, racing games, solitaire, bowling, and NFL
sports games. Participants report they have difficulty
speaking loudly, have fluency difficulties, and are sometimes
unsure of what to say when speaking.
In the following sections, we organize our findings by the
specific prototype they relate to. We then report general
insights into reward systems our participants are interested
in and their experience using the speech mechanics.
Speech with Sam
Researchers observed that two participants were laughing
while playing Speech with Sam. This may be because of
humorous phrases that are present in the narrative such as,
“Slugs don’t wear boots!”
Due to software updates, some features of the game became
unresponsive, which understandably frustrated many of our
players. It was not always obvious to the players that they
needed to touch flashing objects to progress in the game. All
instructions in the game were displayed as text, but many of
our participants could not read, so researches aided players
by dictating the instructions. Players wanted better feedback
when interacting because at times, they were unsure if the
game accepted or rejected their responses. They were also
unsure when they had to repeat themselves. Participants
enjoyed the pace of Speech with Sam.
Speech Adventure
The mini-games in speech adventure, particularly in the
rocket scene, gave immediate feedback when the player
correctly interacted with the game. Many player’s visceral
reactions to the sounds and animations were very positive.
They enjoyed that fireworks celebrated their success with a
satisfying pop sound. Speech Adventure kept track of player
scores and displayed them to the participants. Two players
reported that seeing their score was satisfying and marked
their progress.
Players who could not read also struggled with this prototype
because the instructions were written out as text and needed
the researcher’s help to navigate through the game
objectives. Many of the mini-games rely on the players to
speak in the correct rhythm, but this was extremely
challenging for some. The scenes automatically progress
after an allotted amount of time. Players found this happened
too fastjust as they were beginning to understand the
objectives and mechanics, the next game would be displayed.
Some of the game objectives were too complicated and
several players never learned how to complete objectives in
the allotted time. In general, Speech Adventure needs to be
slowed down drastically and the instructions need to be
Three out of five participants preferred SpokeIt to the rest of
the prototypes. They especially appreciated that all the
instructions were spoken allowed by the main character and
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
displayed as text on the screen. Participants found the
interaction objectives much simpler because SpokeIt never
requires a player to touch the screen. Many participants
found SpokeIt to be incredibly aesthetically pleasing. They
loved the colors, graphics, and animations. Participants
preferred the highly animated main character in SpokeIt
because it represented a more responsive and lively element.
One participant was very interested in bringing SpokeIt
home with her.
One participant commented that he would like SpokeIt to
repeat the instructions because he did not always remember
what he was supposed to say. Most users seemed most
enthusiastic about playing SpokeIt again, indicating the
hybrid structure may improve re-playability of mini-games
because they fit within an overall narrative.
Speech Recognition and Mechanics
We report that users are neutral about the accuracy of our
speech recognition system (Q1), but found it responded
quickly (Q2). They found the speech mechanic was
rewarding and enjoyable, and they think the games are
suitable systems to promote practicing speech at home (Q3).
We are interested in rewarding players for practicing speech
in a meaningful way to them. Hence, we brainstormed a few
ideas with our participants and asked them to vote on which
reward would be most interesting to them. Our ideas
Hats and clothes to accessorize a character or avatar after
completing sessions
Using scored points to spend in virtual store. Points could
be spent to buy items for a virtual garden or furnishing the
character’s house.
Reducing total time needed to practice speech in the future.
If a player does really well in a 10-minute session, then
tomorrow they only need to play for 8 minutes.
Out-of-game rewards (Stickers, candy, other physical
Overwhelmingly, our participants were interested in out-of-
game rewards. They were extremely excited about the idea
of receiving candy when they do well in the game.
Usability Considerations
Watching players use our systems was very informative. We
identify 3 main issues that must be addressed in speech
therapy systems:
Many participants cannot read: the game must be very
clear and be designed around to accommodate players that
cannot read. Objectives should be spoken aloud and be
repeated if necessary. If participants are struggling, the
game should either change the objectives or move forward
with the plot.
To make progress in the game, players must use their
speech. Touch should be used to support players, provide
clues, or demonstrate correct pronunciation. These
mechanics should not be mixed.
More feedback for correct and incorrect interactions must
be given to make progress clear. If the player pronounces
a target incorrectly, the game should support that player in
saying it correctly.
Many users put the iPads to their heads because they had
trouble hearingthe game volume must be louder, or
headphones must be provided, especially in noisy
We found that players prefer the hybrid structure because
mini-games were given context in an overarching plot. Mini-
games that are played out of a narrative context seem to lose
their novelty as soon as they are repeated. Our users seemed
to care a lot about aesthetics and animations indicating that
high levels of polish are important.
Storybook-style games require much more work to generate
content and narrative consistency. Speech therapy games
should be available and fresh for as long as the individual
needs to practice speech. Narrative content that surrounds
mini-games is an effective balance of development time, re-
playability, and diversity of speech targets.
Speech Recognition Accuracy
In the future, we would like to explore improving the
accuracy of the speech recognition system. This is obviously
an important feature of a speech therapy game, especially for
when we explore the clinical validity of the game. There are
two controllable factors that we considerthe physical
hardware, and the acoustic models. We would like to
compare the accuracy of the system when using a noise
cancelling microphone and when using the built-in
microphone. We would also like to collect speech data from
our specific target populations to train the acoustic model
and compare it to the default model. Collecting speech data
from vulnerable populations is a challenge due to HIPPA
regulations in the United States. In this context, speech is
Likert Results
Values 4 4 4
5 4 5
2 4 4
Average 3.2 4 4.2
Table 2: Speech Recognition and Speech Mechanics Likert
Survey Results
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
categorized as medical data and must be approved before it
can be stored safely. Another challenge is the tedious process
of hand-coding the speech data in a way that allows acoustic
models to be trained using machine learning techniques.
Our brainstorm session about preferred reward systems
indicated that our users were not interested in badges, stars,
and pointsthey wanted tangible real-world rewards and
commemoration from their mentors. We do not want to
simply gamify [25] speech therapy with banners, badges,
ribbons, and stars. In the future, we are interested in creating
a rewarding experience that integrates into current practices.
Our serious game for health should be one component of a
broader holistic therapy experience. We envision the final
product to have a companion app for speech therapists to
assign individualized curriculum goals and receive reports of
patient progress. If patients meet these goals, an SLP could
reward patients with tangible prizes.
Working with Children with Speech Impairments
As previously stated in the Design section, SpokeIt is
intended for children with speech impairments, but regular
access to this population is difficult, so out of convenience,
we conducted our study with adults with developmental
disabilities co-occurring with speech impairment. Using a
childish design on an adult population is a limitation of this
study, but our participants truly enjoyed the experiences and
never indicated that they felt patronized. Also, it is worth
reiterating that this adult population has the potential to
benefit from this work, as they also have speech goals and
articulation disorders.
Speech and language development is important for
children’s future ability to live independently and to
participate fully in society [20]. In 2012, nearly 8% of
children aged 3-17 in the United States had a communication
disorder and younger children, boys, and non-Hispanic white
children were more likely than other children to receive an
intervention service for their disorder [53]. Children with
speech impairments such as Cleft Lip Palate have high risks
of behavioral problems and increased symptoms of
depression [20]. They show more deficits in social and
academic competencies, score higher for social problems
[13], and are more likely to be teased in social settings [20].
Even those who undergo a corrective surgery tend to display
a delay in scholarship, have a lower income, marry later in
life and become independent from their parents significantly
later [21]. Clearly, further exploration of using SpokeIt to
help children improve their speech is worth exploring.
Methodological considerations
Our users struggled with Likert style questions, so we needed
to adapt how we conducted them onsite, which we detail
here. This can serve to other researchers working with a
similar target group (adults with developmental disabilities).
We first asked whether they agreed with the statement,
disagreed with the statement, or did not know. If they did not
know, we marked down a 3. If they said they agreed, we
asked if they agreed a lot or a little. If they said they agreed
a lot, we put a 5. If they said they agreed a little, we marked
down a 4. We followed the same process if they disagreed.
If facilitators felt a participant was answering just to please
us, we would ask the same questions in the opposite way and
remind users that we want them to be authentic. Some
participants changed their answers, which lead us to believe
our results may not be completely representative of our
population. Asking Likert style questions in this way was
cumbersome and may result in data that does not represent
the population.
This material is based in part upon work supported by the
National Science Foundation under Grant number #1617253.
We also thank Doctor Travis Tollefson and SLP Christina
Roth for their aid in conducting user evaluations. Any
opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science
1. Clark C Abt. 1987. Serious games. University press of
2. Frank R Adams, Hubert Crepy, David Jameson, and J
Thatcher. 1989. IBM products for persons with
disabilities. In Global Telecommunications Conference
and Exhibition’Communications Technology for the
1990s and Beyond’(GLOBECOM), 1989. IEEE, 980
3. Olle Bälter, Olov Engwall, Anne-Marie Öster, and
Hedvig Kjellström. 2005. Wizard-of-Oz test of
ARTUR: a computer-based speech training system with
articulation correction. In Proceedings of the 7th
international ACM SIGACCESS conference on
Computers and accessibility, 3643.
4. Elizabeth Boyle, Thomas M Connolly, and Thomas
Hainey. 2011. The role of psychology in understanding
the impact of computer games. Entertainment
Computing 2, 2: 6974.
5. EB Brownlie, Joseph H Beitchman, Michael Escobar,
Arlene Young, Leslie Atkinson, Carla Johnson, Beth
Wilson, and Lori Douglas. 2004. Early language
impairment and young adult delinquent and aggressive
behavior. Journal of abnormal child psychology 32, 4:
6. H Timothy Bunnell, Debra M Yarrington, and James B
Polikoff. 2000. STAR: articulation training for young
children. In Sixth International Conference on Spoken
Language Processing.
7. Biran Burke. 2016. Gamify: How gamification
motivates people to do extraordinary things. Routledge.
8. Kerstin Carlstedt, Gunilla Henningsson, and Göran
Dahllöf. 2003. A four-year longitudinal study of palatal
plate therapy in children with Down syndrome: effects
on oral motor function, articulation and communication
preferences. Acta Odontologica Scandinavica 61, 1:
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
9. Mary Clement and Thomas E Twitchell. 1959.
Dysarthria in cerebral palsy. Journal of Speech and
Hearing Disorders 24, 2: 118–122.
10. Colette Coleman and Lawrence Meyers. 1991.
Computer recognition of the speech of adults with
cerebral palsy and dysarthria. Augmentative and
Alternative Communication 7, 1: 3442.
11. Sebastian Deterding, Staffan L. Björk, Lennart E.
Nacke, Dan Dixon, and Elizabeth Lawley. 2013.
Designing gamification: creating gameful and playful
experiences. In CHI’13 Extended Abstracts on Human
Factors in Computing Systems, 32633266.
12. Jared Duval, Zachary Rubin, Elizabeth Goldman, Nick
Antrilli, Yu Zhang, Su-Hua Wang, and Sri Kurniawan.
2017. Designing Towards Maximum Motivation and
Engagement in an Interactive Speech Therapy Game. In
Proceedings of the 2017 Conference on Interaction
Design and Children, 589594.
13. Kristin Billaud Feragen, Ingela L Kvalem, Nichola
Rumsey, and Anne IH Borge. 2010. Adolescents with
and without a facial difference: the role of friendships
and social acceptance in perceptions of appearance and
emotional resilience. Body Image 7, 4: 271279.
14. Javier Franco-Pedroso and Joaquin Gonzalez-
Rodriguez. 2016. Linguistically-constrained formant-
based i-vectors for automatic speaker recognition.
Speech Communication 76: 6181.
15. Kathrin Maria Gerling, Regan L Mandryk, Max
Valentin Birk, Matthew Miller, and Rita Orji. 2014. The
effects of embodied persuasive games on player
attitudes toward people using wheelchairs. In
Proceedings of the 32nd annual ACM conference on
Human factors in computing systems, 34133422.
16. Donna M Hanson, Alfred W Jackson, Randi J
Hagerman, John M Opitz, and James F Reynolds. 1986.
Speech disturbances (cluttering) in mildly impaired
males with the Martin-Bell/fragile X syndrome.
American Journal of Medical Genetics Part A 23, 12:
17. Patricia Howlin, Lynn Mawhood, and Michael Rutter.
2000. Autism and developmental receptive language
disorderA follow-up comparison in early adult life.
II: Social, behavioural, and psychiatric outcomes. The
Journal of Child Psychology and Psychiatry and Allied
Disciplines 41, 5: 561–578.
18. D. Huggins-Daines, M. Kumar, A. Chan, A. W. Black,
M. Ravishankar, and A. I. Rudnicky. 2006.
Pocketsphinx: A Free, Real-Time Continuous Speech
Recognition System for Hand-Held Devices. In 2006
IEEE International Conference on Acoustics Speech
and Signal Processing Proceedings, II.
19. David Huggins-Daines, Mohit Kumar, Arthur Chan,
Alan W Black, Mosur Ravishankar, and Alexander I
Rudnicky. 2006. Pocketsphinx: A free, real-time
continuous speech recognition system for hand-held
devices. In Acoustics, Speech and Signal Processing,
2006. ICASSP 2006 Proceedings. 2006 IEEE
International Conference on, I—-I.
20. Dr Orlagh Hunt, Dr Donald Burden, Dr Peter Hepper,
Dr Mike Stevenson, and Dr Chris Johnston. 2007.
Parent Reports of the Psychosocial Functioning of
Children with Cleft Lip and/or Palate. The Cleft Palate-
Craniofacial Journal 44, 3: 304311.
21. Mercy Larnyoh. 2015. Determining social challenges of
children with cleft lip and or palate as perceived by
parents or caretakers at Komfo Anokye Teaching
Hospital in Kumasi Metropolis in Ashanti region,
22. Debra A Lieberman. 1997. Interactive video games for
health promotion: Effects on knowledge, self-efficacy,
social support, and health. Health promotion and
interactive technology: Theoretical applications and
future directions: 103120.
23. Elena Márquez Segura, Annika Waern, Luis Márquez
Segura, and David López Recio. 2016. Playification:
The PhySeEar case. 376388.
24. Robert C Marshall, Robert T Wertz, David G Weiss,
James L Aten, Robert H Brookshire, Luis Garcia-
Bunuel, Audrey L Holland, John F Kurtzke, Leonard L
LaPointe, Franklin J Milianti, and others. 1989. Home
treatment for aphasic patients by trained
nonprofessionals. Journal of Speech and Hearing
Disorders 54, 3: 462470.
25. Jane McGonigal. 2011. We Don’t Need No Stinkin’
Badges: How to Re-invent Reality Without
Gamification. Retrieved December 18, 2015 from
26. Jane McGonigal. 2011. Reality is broken: Why games
make us better and how they can change the world.
27. Ray M Merrill, Nelson Roy, and Jessica Lowe. 2013.
Voice-related symptoms and their effects on quality of
life. Annals of Otology, Rhinology & Laryngology 122,
6: 404411.
28. David R Michael and Sandra L Chen. 2005. Serious
games: Games that educate, train, and inform. Muska
& Lipman/Premier-Trade.
29. Tony Morelli, Lauren Lieberman, John Foley, and
Eelke Folmer. 2014. An exergame to improve balance
in children who are blind. In FDG.
30. Megan A Morris, Sarah K Meier, Joan M Griffin,
Megan E Branda, and Sean M Phelan. 2016. Prevalence
and etiologies of adult communication disabilities in the
United States: Results from the 2012 National Health
Interview Survey. Disability and health journal 9, 1:
31. Jack Mostow and others. 2001. Evaluating tutors that
listen: An overview of Project LISTEN. In Smart
machines in education, 169234.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
32. Cosmin Munteanu, Joanna Lumsden, Hélène Fournier,
Rock Leung, Danny D’Amours, Daniel McDonald, and
Julie Maitland. 2010. ALEX: mobile language assistant
for low-literacy adults. In Proceedings of the 12th
international conference on Human computer
interaction with mobile devices and services, 427430.
33. Joan Murphy. 2006. Perceptions of communication
between people with communication disability and
general practice staff. Health Expectations 9, 1: 4959.
34. Heidi Parisod, Anni Pakarinen, Anna Axelin, Riitta
Danielsson-Ojala, Jouni Smed, and Sanna Salanterä.
2017. Designing a Health-Game Intervention
Supporting Health Literacy and a Tobacco-Free Life in
Early Adolescence. Games for Health Journal.
35. Larry J Platt, Gavin Andrews, Margrette Young, and
Peter T Quinn. 1980. Dysarthria of adult cerebral palsy:
I. Intelligibility and articulatory impairment. Journal of
Speech, Language, and Hearing Research 23, 1: 2840.
36. Alessandra Preziosa, Alessandra Grassi, Andrea
Gaggioli, and Giuseppe Riva. 2009. Therapeutic
applications of the mobile phone. British Journal of
Guidance & Counselling 37, 3: 313325.
37. Josephine M Randel, Barbara A Morris, C Douglas
Wetzel, and Betty V Whitehill. 1992. The effectiveness
of games for educational purposes: A review of recent
research. Simulation & gaming 23, 3: 261276.
38. William M Reynolds and Susan Reynolds. 1979.
Prevalence of speech and hearing impairment of
noninstitutionalized mentally retarded adults. American
journal of mental deficiency.
39. John C Rosenbek, Margaret L Lemme, Margery B
Ahern, Elizabeth H Harris, and Robert T Wertz. 1973.
A treatment for apraxia of speech in adults. Journal of
Speech and Hearing Disorders 38, 4: 462472.
40. Zachary Rubin. 2017. Development and evaluation of
software tools for speech therapy. University of
California, Santa Cruz.
41. Zachary Rubin, Sri Kurniawan, and Travis Tollefson.
2014. Results from using automatic speech recognition
in cleft speech therapy with children. In International
Conference on Computers for Handicapped Persons,
42. Zak Rubin and Sri Kurniawan. 2013. Speech
Adventure: Using Speech Recognition for Cleft Speech
Therapy. In Proceedings of the 6th International
Conference on PErvasive Technologies Related to
Assistive Environments (PETRA ’13), 35:135:4.
43. Ellen Sciuto. 2013. The iPad: Using new technology for
teaching reading, language, and speech for children with
hearing loss.
44. Lawrence D Shriberg, Rhea Paul, Jane L McSweeny,
Ami Klin, Donald J Cohen, and Fred R Volkmar. 2001.
Speech and prosody characteristics of adolescents and
adults with high-functioning autism and Asperger
syndrome. Journal of Speech, Language, and Hearing
Research 44, 5: 10971115.
45. Ann Bosma Smit. 2004. Articulation and phonology
resource guide for school-age children and adults.
Cengage Learning.
46. Ali JA Soleymani, Martin J McCutcheon, and MH
Southwood. 1997. Design of speech illumina mentor
(SIM) for teaching speech to the hearing impaired. In
Biomedical Engineering Conference, 1997.,
Proceedings of the 1997 Sixteenth Southern, 425428.
47. Jonathan M Sykes and Travis T Tollefson. 2005.
Management of the cleft lip deformity. Facial plastic
surgery clinics of North America 13, 1: 157167.
48. John Van Borsel and An Vandermeulen. 2008.
Cluttering in Down syndrome. Folia Phoniatrica et
Logopaedica 60, 6: 312–317.
49. Klara Vicsi, Peter Roach, A Öster, Zdravko Kacic, Peter
Barczikay, Andras Tantos, Ferenc Csatári, Zs Bakcsi,
and Anna Sfakianaki. 2000. A multimedia, multilingual
teaching and training system for children with speech
disorders. International Journal of speech technology 3,
3–4: 289–300.
50. Maria Virvou, George Katsionis, and Konstantinos
Manos. 2005. Combining software games with
education: Evaluation of its educational effectiveness.
Educational Technology & Society 8, 2: 5465.
51. Laurie A Vismara, Costanza Colombi, and Sally J
Rogers. 2009. Can one hour per week of therapy lead to
lasting changes in young children with autism? Autism
13, 1: 93115.
52. Charles S Watson, Daniel J Reed, Diane Kewley-Port,
and Daniel Maki. 1989. The Indiana Speech Training
Aid (ISTRA) I: Comparisons between human and
computer-based evaluation of speech quality. Journal of
Speech, Language, and Hearing Research 32, 2: 245
53. 2015. National Center for Health Statistics. Centers for
Disease Control and Prevention. Retrieved from
54. 2016. Statistics on Voice, Speech, and Language. U.S.
Department of Health and Human Services. Retrieved
55. Quick Statistics About Voice, Speech, Language.
National Institute on Deafness and Other
Communication Disorders. Retrieved from
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
... This section presents 2 domains covering theories: frameworks and approaches from the included papers that were intertwined with participatory development (domain 1) and iterative processes (domain 2; Table 2 provides an overview). Various inclusive theories and frameworks were used such as the sensitive inclusive design approach [47], human-or user-centered design [51,[55][56][57][58], participatory design [19,60,64], participatory action research [46,63], and co-design [46,48,65]. The iterative approach was applied using various frameworks such as the Reflective Agile Iterative Design (RAID) [65], phased development [47], and iterative design [19,48,50,56]. ...
... Various inclusive theories and frameworks were used such as the sensitive inclusive design approach [47], human-or user-centered design [51,[55][56][57][58], participatory design [19,60,64], participatory action research [46,63], and co-design [46,48,65]. The iterative approach was applied using various frameworks such as the Reflective Agile Iterative Design (RAID) [65], phased development [47], and iterative design [19,48,50,56]. Participatory development with iterative approaches was shaped by the level of engagement, type of stakeholders, and reason for involvement. ...
... The studies showed different levels of end users' engagement and participation throughout the design and development process. The 14 (82%) of the 17 studies that applied an inclusive theory or approach involved people with IDs as primary stakeholders throughout the full development process to facilitate a full understanding of users' perceptions, needs, and abilities [19,[46][47][48]51,[55][56][57][58]60,[62][63][64][65]. In total, 2 (12%) of the 17 studies reported end user involvement in early-stage prototype testing to ensure that important usability and accessibility issues (eg, language use and button size) could be corrected [47,56]. ...
Full-text available
Background: The use of eHealth is more challenging for people with intellectual disabilities (IDs) than for the general population because the technologies often do not fit the complex needs and living circumstances of people with IDs. A translational gap exists between the developed technology and users' needs and capabilities. User involvement approaches have been developed to overcome this mismatch during the design, development, and implementation processes of the technology. The effectiveness and use of eHealth have received much scholarly attention, but little is known about user involvement approaches. Objective: In this scoping review, we aimed to identify the inclusive approaches currently used for the design, development, and implementation of eHealth for people with IDs. We reviewed how and in what phases people with IDs and other stakeholders were included in these processes. We used 9 domains identified from the Centre for eHealth Research and Disease management road map and the Nonadoption, Abandonment, and challenges to the Scale-up, Spread, and Sustainability framework to gain insight into these processes. Methods: We identified both scientific and gray literature through systematic searches in PubMed, Embase, PsycINFO, CINAHL, Cochrane, Web of Science, Google Scholar, and (websites of) relevant intermediate (health care) organizations. We included studies published since 1995 that showed the design, development, or implementation processes of eHealth for people with IDs. Data were analyzed along 9 domains: participatory development, iterative process, value specification, value proposition, technological development and design, organization, external context, implementation, and evaluation. Results: The search strategy resulted in 10,639 studies, of which 17 (0.16%) met the inclusion criteria. Various approaches were used to guide user involvement (eg, human or user-centered design and participatory development), most of which applied an iterative process mainly during technological development. The involvement of stakeholders other than end users was described in less detail. The literature focused on the application of eHealth at an individual level and did not consider the organizational context. Inclusive approaches in the design and development phases were well described; however, the implementation phase remained underexposed. Conclusions: The participatory development, iterative process, and technological development and design domains showed inclusive approaches applied at the start of and during the development, whereas only a few approaches involved end users and iterative processes at the end of the process and during implementation. The literature focused primarily on the individual use of the technology, and the external, organizational, and financial contextual preconditions received less attention. However, members of this target group rely on their (social) environment for care and support. More attention is needed for these underrepresented domains, and key stakeholders should be included further on in the process to reduce the translational gap that exists between the developed technologies and user needs, capabilities, and context.
... A combinação mais usada foi observação, avaliação in-game e prépós teste, que foi usada na avaliação de quatro jogos (Space Game, Shape Game e Bubble Game apresentados em [5], e Kirana [75]). A segunda combinação mais usada foi observação, entrevista e questionário, que foi utilizada na avaliação de dois jogos (SpokeIt [23] e CopyMe [37]). Cada uma das demais combinações de métodos foi usada em apenas um estudo. ...
... Questionários são frequentemente empregados combinados com outros métodos (11/15). Alguns estudos construíram seu próprio questionário para avaliar seu jogo especificamente (e.g., [3,23,25,37,85]), enquanto outros selecionaram algum questionário padrão. Por exemplo, o estudo de [48] usou um questionário padrão combinado com experimento para validar a viabilidade do Sistema de Realidade Virtual de Avaliação Sensorial (do inglês, Sensory Assessment VR System -SAVR), um jogo para avaliar anormalidades sensoriais em crianças com TEA. ...
... Assim, nestes casos, fazia sentido também incluir adultos em sua avaliação. No entanto, quatro estudos [23,25,75,85] apresentaram jogos voltados especificamente para crianças, mas realizaram as avaliações com adultos. Os autores dos artigos [25,75,85] não deram nenhuma razão para esta escolha, já o artigo [23] argumentou que o jogo SpokeIt é feito para crianças com deficiência de desenvolvimento e deficiência de fala, mas como é difícil ter acesso a esses indivíduos, a solução foi testar com adultos com esse perfil. ...
... However, one of the studies [19] did not involve the parents, and another study [101] was only based on SLTs' and caregivers' input. Two papers [100,101] reported the mechanism used to assess children's speech and language, and provided a report to the SLTs, based on an automated speech analysis system, which was also at the heart of other studies [100][101][102][103][104][105]. Bono et al. [106] engaged parents as co-players, while other authors [100,[102][103][104] simply implemented the system with an at-a-distance control or report for SLTs and children (parents were not included). ...
... Some tools [100,102,105,107] used rewards to increase the amount of time spent by children playing and to potentiate enjoyment and the desire to replay. Parnandi et al. [100] mentioned rewards as an idea to explore and simply as another extra to keep the interest. ...
... The authors [102] pose the question of how players would react to longer periods of game use. The type of reward may also play a part [105], where the targeted users, probably due to age (adults with speech impairments), preferred physical real-world rewards or accolades. ...
Full-text available
This paper presents the state of the art regarding the use of tangible user interfaces for internet of artefacts (IoA) targeting health applications, with a focus on speech and language therapy and related areas, targeting home-based interventions, including data security and privacy issues. Results from a systematic literature review, focus group, and a nationwide questionnaire have been used to determine the system requirements for an artefact prototype to be developed. The aim of this study was to understand what is the usual practice of clinicians and to contribute to a better intervention or post-intervention approach for children with Speech Sound Disorders (SSD). The literature review revealed that some studies proposed technological solutions while others used a social approach and/or gamified activities. We could conclude that more research is needed and that a unified method or framework to address SSD intervention or post-intervention tools is lacking. Clinicians need more and better tools to be able to quantify and qualitatively assess the activities developed at home.
... In addressing such speech impairments, Speech-Language Pathologists (SLPs) play a significant role in the screening, assessment, diagnosis, and treatment of persons with SSD. Personalized speech therapy and practice monitored by SLPs can improve the acquisition of speech skills [7]. However, the accessibility of SLPs is crucial for such intervention. ...
... However, the accessibility of SLPs is crucial for such intervention. A report suggests that up to 70 % of SLPs have waiting lists, which indicates a shortage in the workforce [7,8]. Furthermore, according to United Nations Children's Fund (UNICEF), there are not adequate speech-language therapy services for children with communication disorders and disabilities [9]. ...
... Researchers have also specifically worked and devised AI-based tools for persons with hearing impairment [23,24]. A novel tongue-based Human-Computer interaction tool [25] and gamified AI-based tool [7] for persons with motor speech disorder have been proposed. [28,29]. ...
Full-text available
This paper presents a systematic literature review of published studies on AI-based automated speech therapy tools for persons with speech sound disorders (SSD). The COVID-19 pandemic has initiated the requirement for automated speech therapy tools for persons with SSD making speech therapy accessible and affordable. However, there are no guidelines for designing such automated tools and their required degree of automation compared to human experts. In this systematic review, we followed the PRISMA framework to address four research questions: 1) what types of SSD do AI-based automated speech therapy tools address, 2) what is the level of autonomy achieved by such tools, 3) what are the different modes of intervention, and 4) how effective are such tools in comparison with human experts. An extensive search was conducted on digital libraries to find research papers relevant to our study from 2007 to 2022. The results show that AI-based automated speech therapy tools for persons with SSD are increasingly gaining attention among researchers. Articulation disorders were the most frequently addressed SSD based on the reviewed papers. Further, our analysis shows that most researchers proposed fully automated tools without considering the role of other stakeholders. Our review indicates that mobile-based and gamified applications were the most frequent mode of intervention. The results further show that only a few studies compared the effectiveness of such tools compared to expert Speech-Language Pathologists (SLP). Our paper presents the state-of- the-art in the field, contributes significant insights based on the research questions, and provides suggestions for future research directions.
... Ambient and environmental noise that affected the game performance [30, 32, 33, 49] 3 Contradiction between game levels and the needs of target groups (the game was very difficult or too easy) [14, 21, 52] 4 e game was challenging because it required two hands to play [14, 21] 5 Children could not easily read words or phrases due to inadequate instruction [14, 31] 6 Not all participants were willing to wear the headset microphone [14, 42] 7 Delays in speech recognition [40, 43] 8 e game did not recognize low tune voices, and children had to speak loudly [30, 31] 9 e designed game did not provide feedback on accepting or rejecting children's voices [31] 10 One of the challenges at design phase was that each target phrase or word had to be carefully crafted to fit into the narrative of the game and this was very time-consuming, which could result in minimal content [31] 11 ...
... Ambient and environmental noise that affected the game performance [30, 32, 33, 49] 3 Contradiction between game levels and the needs of target groups (the game was very difficult or too easy) [14, 21, 52] 4 e game was challenging because it required two hands to play [14, 21] 5 Children could not easily read words or phrases due to inadequate instruction [14, 31] 6 Not all participants were willing to wear the headset microphone [14, 42] 7 Delays in speech recognition [40, 43] 8 e game did not recognize low tune voices, and children had to speak loudly [30, 31] 9 e designed game did not provide feedback on accepting or rejecting children's voices [31] 10 One of the challenges at design phase was that each target phrase or word had to be carefully crafted to fit into the narrative of the game and this was very time-consuming, which could result in minimal content [31] 11 ...
Full-text available
Introduction: Treatment of speech disorders during childhood is essential. Many technologies can help speech and language pathologists (SLPs) to practice speech skills, one of which is digital games. This study aimed to systematically investigate the games developed to treat speech disorders and their challenges in children. Methods: A comprehensive search was conducted in four databases, including Medline (through PubMed), Scopus, Web of Science, and IEEE Xplore, to retrieve English articles published by July 14, 2021. The articles in which a digital game was developed to treat speech disorders in children were included in the study. Then, the features of the designed games and their challenges were extracted from the studies. Results: After reviewing the full texts of 69 articles and assessing them in terms of inclusion and exclusion criteria, 27 articles were included in the systematic review. In these articles, 59.25% of the games had been developed in English language and children with hearing impairments had received much attention from researchers compared to other patients. Also, the Mel-Frequency Cepstral Coefficients (MFCC) algorithm and the PocketSphinx speech recognition engine had been used more than any other speech recognition algorithm and tool. In terms of the games, 48.15% had been designed in a way that children could practice with the help of their parents. The evaluation of games showed a positive effect on children's satisfaction, motivation, and attention during speech therapy exercises. The biggest barriers and challenges mentioned in the studies included sense of frustration, low self-esteem after several failures in playing games, environmental noise, contradiction between games levels and the target group's needs, and problems related to speech recognition. Conclusion: The results of this study showed that the games positively affect children's motivation to continue speech therapy, and they can also be used as the SLPs' aids. Before designing these tools, the obstacles and challenges should be considered, and also, the solutions should be suggested.
... They can create opportunities to train new skills in a safe and engaging environment, improving perceived selfefficacy, a key aspect in health promotion interventions. By following this direction, the researchers in [23] present SpokeIt, which is a mobile serious game for health designed to support speech articulation exercises. Articulation station is a novel mobile speech therapy app that allows speech language pathologists to customize target sounds and sound placement for patients of all ages to practice [24]; in this direction, retro gaming can be a technical resources for providing users with visual feedback [25]. ...
Nowadays, many application scenarios benefit from automatic speech recognition (ASR) technology. Within the field of speech therapy, in some cases ASR is exploited in the treatment of dysarthria with the aim of supporting articulation output. However, in presence of atypical speech, standard ASR approaches do not provide any reliable result in terms of voice recognition due to main issues, including: (i) the extreme intra and inter-speakers variability of the speech in presence of speech impairments, such as dysarthria; (ii) the absence of dedicated corpora containing voice samples from users with a speech disability to train a state-of-the-art speech model, particularly in non-English languages. In this paper, we focus on isolated word recognition for native Italian speakers with dysarthria and we exploit an existing mobile app to collect audio data from users with speech disorders while they perform articulation exercises for speech therapy purposes. With this data availability, a convolutional neural network has been trained to spot a small number of keywords within atypical speech, according to a speaker dependent method. Finally, we discuss the benefits of the trained ASR system in tailored telerehabilitation contexts intended for patients with dysarthria who can follow treatment plans under the supervision of remote speech language pathologists.
... The design and implementation of mobile apps for use by children with communication disorders is a research area that draws attention from both clinical researchers and human-computer interaction researchers. In recent years, human-computer interaction scholars have designed apps for children with autism, cleft palate, speech sound disorders, cochlear implants, and other communication disorders [8][9][10][11][12]. Given the variety of needs among children with communication disorders, developers and designers may encounter difficulties obtaining verbal or written user feedback on app content and features while creating and revising these apps; consequently, they must rely on reports from key stakeholders that surround the circle of care of children with communication disorders [13,14]. ...
Full-text available
Abstract Background: With the plethora of mobile apps available on the Apple App Store, more speech-language pathologists (SLPs) have adopted apps for speech-language therapy services, especially for pediatric clients. App Store reviews are publicly available data sources that can not only create avenues for communication between technology developers and consumers but also enable stakeholders such as parents and clinicians to share their opinions and view opinions about the app content and quality based on user experiences. Objective: This study examines the Apple App Store reviews from multiple key stakeholders (eg, parents, educators, and SLPs) to identify and understand user needs and challenges of using speech-language therapy apps (including augmentative and alternative communication [AAC] apps) for pediatric clients who receive speech-language therapy services. Methods: We selected 16 apps from a prior interview study with SLPs that covered multiple American Speech-Language-Hearing Association Big Nine competencies, including articulation, receptive and expressive language, fluency, voice, social communication, and communication modalities. Using an automatic Python (Python Software Foundation) crawler developed by our research team and a Really Simple Syndication feed generator provided by Apple, we extracted a total of 721 app reviews from 2009 to 2020. Using qualitative coding to identify emerging themes, we conducted a content analysis of 57.9% (418/721) reviews and synthesized user feedback related to app features and content, usability issues, recommendations for improvement, and multiple influential factors related to app design and use. Results: Our analyses revealed that key stakeholders such as family members, educators, and individuals with communication disorders have used App Store reviews as a platform to share their experiences with AAC and speech-language apps. User reviews for AAC apps were primarily written by parents who indicated that AAC apps consistently exhibited more usability issues owing to violations of design guidelines in areas of aesthetics, user errors, controls, and customization. Reviews for speech-language apps were primarily written by SLPs and educators who requested and recommended specific app features (eg, customization of visuals, recorded feedback within the app, and culturally diverse character roles) based on their experiences working with a diverse group of pediatric clients with a variety of communication disorders. Conclusions: To our knowledge, this is the first study to compile and analyze publicly available App Store reviews to identify areas for improvement within mobile apps for pediatric speech-language therapy apps from children with communication disorders and different stakeholders (eg, clinicians, parents, and educators). The findings contribute to the understanding of apps for children with communication disorders regarding content and features, app usability and accessibility issues, and influential factors that impact both AAC apps and speech-language apps for children with communication disorders who need speech therapy.
Full-text available
Developing games is time-consuming and costly. Overly clinical therapy games run the risk of being boring, which defeats the purpose of using games to motivate healing in the first place [10, 23]. In this work, we adapt and repurpose an existing immersive virtual reality (iVR) game, Spellcasters, originally designed purely for entertainment for use as a stroke rehabilitation game—which is particularly relevant in the wake of COVID-19, where telehealth solutions are increasingly needed [4]. In preparation for participatory design sessions with stroke survivors, we collaborate with 14 medical professionals to ensure Spellcasters is safe and therapeutically valid for clinical adoption. We present our novel VR sandbox implementation that allows medical professionals to customize appropriate gestures and interactions for each patient's unique needs. Additionally, we share a co-designed companion app prototype based on clinicians' preferred data reporting mechanisms for telehealth. We discuss insights about adapting and repurposing entertainment games as serious games for health, features that clinicians value, and the potential broader impacts of applications like Spellcasters for stroke management.
Conference Paper
Full-text available
Children with speech impairments often find speech curriculums tedious, limiting how often children are motivated to practice. A speech therapy game has the potential to make practice fun, may help facilitate increased time and quality of at-home speech therapy, and lead to improved speech. We explore using conversational real-time speech recognition, game methodologies theorized to improve immersion and flow, and user centered approaches to design an immersive interactive speech therapy solution. Our preliminary user evaluation showed that compared to traditional methods, children were more motivated to practice speech using our system.
Conference Paper
Full-text available
The concept of playification has recently been proposed as an extension of, or alternative to, gamification. We present a playification design project targeting the re-design of physiotherapy rehabilitative sessions for elderly inpatients. The menial and repetitive nature of the physical exercises targeted for design might seem ideal for shallow widespread gamification approaches that add external rewards to entice usage. In the PhySeEar project, we introduced a "third agent" instead, in the form of technology that would take over some of the work typically carried out by the physiotherapist. This technological intervention triggered the emergence of playfulness, when inpatients and the therapist re-signified the ongoing activity by engaging in playful role-taking, such as blaming the technology for mistakes, or for sensitivity to the inpatient's inaccurate movements. Based on the experiences from this project, we discuss some of the major differences between playification and gamification
Objective: The purpose of this study was to explore the design of a health game that aims to both support tobacco-related health literacy and a tobacco-free life in early adolescence and to meet adolescents' expectations. Materials and methods: Data were collected from adolescents using an open-ended questionnaire (n = 83) and focus groups (n = 39) to obtain their view of a health game used for tobacco-related health education. The data were analyzed using thematic analysis. A group of experts combined the adolescents' views with theoretical information on health literacy and designed and produced the first version of the game. Adolescents (session 1, n = 16; session 3, n = 10; and session 4, n = 44) and health promotion professionals (session 2, n = 3) participated in testing the game. Feedback from testing sessions 3 and 4 was analyzed using descriptive statistics. Results: Adolescents pointed out that the health game needs to approach the topic of tobacco delicately and focus on the adolescents' perspective and on the positive sides of a tobacco-free life rather than only on the negative consequences of tobacco. The adolescents expected the game to be of high quality, stimulating, and intellectually challenging and to offer possibilities for individualization. Elements from the adolescents' view and theoretical modelling were embedded into the design of a game called Fume. Feedback on the game was promising, but some points were highlighted for further development. Conclusion: Investing especially in high-quality design features, such as graphics and versatile content, using humoristic or otherwise stimulating elements, and maintaining sufficiently challenging gameplay would promote the acceptability of theory-based health games among adolescents.
Conference Paper
Most children with cleft are required to undertake speech therapy after undergoing surgery to repair their craniofacial defect. However, the untrained ear of a parent can lead to incorrect practice resulting in the development of compensatory structures. Even worse, the boring nature of the cleft speech therapy often causes children to abandon home exercises and therapy altogether. We have developed a simple recognition system capable of detecting impairments on the phoneme level with high accuracy. We embed this into a game environment and provide it to a cleft palate specialist team for pilot testing with children 2 to 5 years of age being evaluated for speech therapy. The system consistently detected cleft speech in high-pressure consonants in 3 out of our 5 sentences. Doctors agreed that this would improve the quality of therapy outside of the office. Children enjoyed the game overall, but grew bored due to the delays of phrase-based speech recognition.
This paper presents a large-scale study of the discriminative abilities of formant frequencies for automatic speaker recognition. Exploiting both the static and dynamic information in formant frequencies, we present linguistically-constrained formant-based i-vector systems providing well calibrated likelihood ratios per comparison of the occurrences of the same isolated linguistic units in two given utterances. As a first result, the reported analysis on the discriminative and calibration properties of the different linguistic units provide useful insights, for instance, to forensic phonetic practitioners. Furthermore, it is shown that the set of units which are more discriminative for every speaker vary from speaker to speaker. Secondly, linguistically-constrained systems are combined at score-level through average and logistic regression speaker-independent fusion rules exploiting the different speaker-distinguishing information spread among the different linguistic units. Testing on the English-only trials of the core condition of the NIST 2006 SRE (24,000 voice comparisons of 5 minutes telephone conversations from 517 speakers -219 male and 298 female-), we report equal error rates of 9.57 and 12.89% for male and female speakers respectively, using only formant frequencies as speaker discriminative information. Additionally, when the formant-based system is fused with a cepstral i-vector system, we obtain relative improvements of ∼6% in EER (from 6.54 to 6.13%) and ∼15% in minDCF (from 0.0327 to 0.0279), compared to the cepstral system alone.
Communication disabilities, including speech, language and voice disabilities, can significantly impact a person's quality of life, employment and health status. Despite this, little is known about the prevalence and etiology of communication disabilities in the general adult population. To assess the prevalence and etiology of communication disabilities in a nationally representative adult sample. We conducted a cross-sectional study and analyzed the responses of non-institutionalized adults to the Sample Adult Core questionnaire within the 2012 National Health Interview Survey. We used respondents' self-report of having a speech, language or voice disability within the past year and receiving a diagnosis for one of these communication disabilities, as well as the etiology of their communication disability. We additionally examined the responses by subgroups, including sex, age, race and ethnicity, and geographical area. In 2012 approximately 10% of the US adult population reported a communication disability, while only 2% of adults reported receiving a diagnosis. The rates of speech, language and voice disabilities and diagnoses varied across gender, race/ethnicity and geographic groups. The most common response for the etiology of a communication disability was "something else." Improved understanding of population prevalence and etiologies of communication disabilities will assist in appropriately directing rehabilitation and medical services; potentially reducing the burden of communication disabilities. Copyright © 2015 Elsevier Inc. All rights reserved.
Speech and prosody-voice profiles for 15 male speakers with High-Functioning Autism (HFA) and 15 male speakers with Asperger syndrome (AS) were compared to one another and to profiles for 53 typically developing male speakers in the same 10- to 50-years age range. Compared to the typically developing speakers, significantly more participants in both the HFA and AS groups had residual articulation distortion errors, uncodable utterances due to discourse constraints, and utterances coded as inappropriate in the domains of phrasing, stress, and resonance. Speakers with AS were significantly more voluble than speakers with HFA, but otherwise there were few statistically significant differences between the two groups of speakers with pervasive developmental disorders. Discussion focuses on perceptual-motor and social sources of differences in the prosody-voice findings for individuals with Pervasive Developmental Disorders as compared with findings for typical speakers, including comment on the grammatical, pragmatic, and affective aspects of prosody.
The articulation errors of 32 spastic and 18 athetoid males, aged 17–55 years, were analyzed using a confusion matrix paradigm. The subjects had a diagnosis of congenital cerebral palsy, and adequate intelligence, hearing, and ability to perform the speech task. Phonetic transcriptions were made of single-word utterances which contained 49 selected phonemes: 22 word-initial consonants, 18 word-final consonants and nine vowels. Errors of substitution, omission and distortion were categorized on confusion matrices such that patterns could be observed. It was found that within-manner errors (place or voicing errors or both) exceeded between-manner errors by a substantial amount, more so on final consonants. The predominant within-manner errors occurred on fricative phonemes for both initial and final positions. Affricate within-manner errors, all of devoicing, were also frequent in final position. The predominant between-manner initial position errors involved liquid-to-glide and affricate-to-stop changes, and for final position, affricate-to-fricative. Phoneme omission occurred three times more frequently on final than on initial consonants. The error data of individual subjects were found to correspond with the identified overall group patterns. Those with markedly reduced speech intelligibility demonstrated the same patterns of error as the overall group. The implications for treatment are discussed.
Experimental comparisons are reported between computer-based and human judgments of speech quality for the same sets of utterances. Speech stimuli were recorded from two normal talkers, who intentionally varied the quality of their speech, and from a hearing-impaired child who was receiving speech therapy on the Indiana Speech Training Aid (ISTRA). The tape recordings were submitted for evaluation to a naive jury, an expert jury, and the ISTRA System, a microcomputer equipped with a speaker-dependent speech recognition board that generated scores representing how well utterance matched a stored template. Correlational analyses of these data indicated that humans were slightly better at judging speech quality than was the computer, but that the computer was much more reliable. These results demonstrate that computer-based speech evaluation may be a reasonable substitute for human judgments for certain types of speech drill.