Conference PaperPDF Available

SpokeIt: building a mobile speech therapy experience

Authors:

Abstract and Figures

SpokeIt is a mobile serious game for health designed to support speech articulation therapy. Here, we present SpokeIt as well as 2 preceding speech therapy prototypes we built, all of which use a novel offline critical speech recognition system capable of providing feedback in real-time. We detail key design motivations behind each of them and report on their potential to help adults with speech impairment co-occurring with developmental disabilities. We conducted a qualitative within-subject comparative study on 5 adults within this target group, who played all 3 prototypes. This study yielded refined functional requirements based on user feedback, relevant reward systems to implement based on user interest, and insights on the preferred hybrid game structure, which can be useful to others designing mobile games for speech articulation therapy for a similar target group.
Content may be subject to copyright.
SpokeIt: Building a Mobile Speech Therapy Experience
Jared Duval, Zachary Rubin, Elena Márquez Segura, Natalie Friedman, Milla Zlatanov, Louise
Yang, Sri Kurniawan
University of California Santa Cruz, Santa Cruz, US
{jduval@ucsc.edu, zarubin@ucsc.edu, elena.marquez@ucsc.edu, nvfriedm@ucsc.edu,
mzlatano@ucsc.edu, lbyang@ucsc.edu, skurnia@ucsc.edu}
ABSTRACT
SpokeIt is a mobile serious game for health designed to
support speech articulation therapy. Here, we present
SpokeIt as well as 2 preceding speech therapy prototypes we
built, all of which use a novel offline critical speech
recognition system capable of providing feedback in real-
time. We detail key design motivations behind each of them
and report on their potential to help adults with speech
impairment co-occurring with developmental disabilities.
We conducted a qualitative within-subject comparative study
on 5 adults within this target group, who played all 3
prototypes. This study yielded refined functional
requirements based on user feedback, relevant reward
systems to implement based on user interest, and insights on
the preferred hybrid game structure, which can be useful to
others designing mobile games for speech articulation
therapy for a similar target group.
Author Keywords
Speech Therapy; User-centered design; Serious Game for
Health; SpokeIt; Developmental Disabilities
ACM Classification Keywords
H.5. Information interfaces and presentation (e.g., HCI);
H.5.2. Voice I/O; I/O J.3. Life and Medical Sciences: Health
INTRODUCTION
Speech is a crucial skill for effective communication,
expression, and sense of self-efficacy. Speech impairments
often co-occur with developmental disabilities such as
Autism Spectrum Disorder [44], Cerebral Palsy [9], and
Down Syndrome [8]. The prevalence of speech impairments
in individuals with developmental disabilities has been as
high as 51% [38]. Each of these developmental disabilities
exhibit symptoms of an articulation disorder. An articulation
disorder is categorized as having difficulty producing speech
sounds that constitute the fundamental components of a
language [54]. Many individuals with speech impairments
experience depression, social isolation, and a lower quality
of life [27]. Speech problems can negatively impact a
person’s employment status [30], and their ability to receive
proper healthcare. This includes receiving wrong diagnosis,
inappropriate medication, and access to service [33]. The rate
of arrests and convictions was higher for boys with language
impairments [5]. In 2012, approximately 10% of the U.S.
adult population experienced a speech, language, or voice
problem [30].
Speech is a skill that can often be improved with
individualized therapy and practice [39,45]. Access to
Speech Language Pathologists (SLPs) is crucial to
improving speech, but up to 70% of SLPs have waiting lists
indicating a shortage in the workforce and disrupted access
to therapy [47]. As a result, many non-professional therapists
are being trained by SLPs to deliver speech therapy outside
of the office [24,51]. This is not an ideal situation because
the SLP must take the time to train the non-professional
speech therapy facilitator, the individual’s therapy schedule
then relies on the facilitator’s schedule, and these facilitators
may not be as effective at delivering a speech curriculum
[24]. Even worse, many untrained facilitators attempt to
deliver speech curriculums reporting a general low sense of
competence in assisting people with disabilities in their
assigned curriculums [33].
Mobile speech therapy games could help people practice
articulation anywhere without the need to be facilitated,
which may potentially expedite their speech therapy progress
[40]. The pervasiveness of mobile hardware makes it an ideal
platform for delivering speech therapy to those who may not
have access to a speech therapist or a facilitator. Many SLPs
design games and activities to engage their clients. Games
and play have been widely recognized as a valid motivator
for otherwise jaded individuals [4]. We expect there are
many benefits to using a mobile speech therapy game,
including the ability to practice anywhere, collect fine-
grained speech data, track the frequency and time individuals
spend practicing, track performance over time, and create
dynamic, custom therapies to each individual. This has
presumably motivated the appearance of many mobile
speech therapy apps with different features and function.
Yet, they tend to require a facilitator to evaluate speech.
Speech recognition has been successfully used to facilitate
speech therapy [2,3,6,10,31,32,46,49,52], but not in a mobile
context focusing on articulation.
In this paper, we describe key implementation details of the
underlying mobile offline real-time critical speech
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work o
wned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
Permissions@acm.org.
MobileHCI '18,
September 36, 2018, Barcelona, Spain
© 2018 Association for Computing Machinery.
ACM ISBN 978
-1-4503-5898-9/18/09…$15.00
https://doi.org/10.1145/3229434.3229484
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:1
recognition system. We present the designs of 2 articulation
therapy game prototypes that culminated in the creation of
SpokeIt. We share results from our comparative within-
subject study, which included 5 adults with disabilities co-
occurring with speech impairment who played each of the 3
designs. Finally, we discuss lessons learned about these
designs, reward systems our participants were interested in,
and promising future work.
RELATED WORK
We use this section to a) introduce the target group in our
studies and discuss the type of articulation therapy that is
generally appropriate, b) motivate our choice to build a
serious game for health for speech therapy, and c) argue that
a gap exists for mobile articulation therapy games utilizing
critical speech recognition systems.
Articulation Disorders Co-occurring with Developmental
Disabilities
Adults with developmental disabilities co-occurring with
speech impairment would benefit from speech therapy [38].
There is a trend towards helping children with speech
impairments [6,8,12,41,43,49,51], but adults with speech
impairments need support as well [44,45]. This support can
come in the form of a speech recognition system, discussed
below.
Autism Spectrum Disorder (ASD)
ASD is one of the most common developmental disabilities,
affecting approximately 400,000 individuals in the United
States [55]. A follow-up study was conducted on children
with ASD and communication problems when they reached
early adulthood showing that the group continued to show
significant stereotyped behavior patterns, problems in
relationships, troubles with employment, and lack of
independence [17]. A person with ASD may have monotonic
(machine-like) intonation, deficits in pitch, vocal quality,
volume, and articulation distortion errors [44].
Cerebral Palsy
A person with Cerebral Palsy and dysarthria (difficult or
unclear articulation of speech that is otherwise linguistically
normal) may include anterior lingual place (front of the
tongue) inaccuracy, reduced precision of fricative
(consonant requiring breath through a small opening) and
affricative (Plosive followed by fricative sound like j as in
jam) manners, slowness of speech, and indistinctness of
speech [9,35].
Down Syndrome
Many people with Down Syndrome have muscle hypotonia
[8]. Muscle hypotonia may cause abnormal movement of the
cheek, lips, and tongue, resulting in articulation errors. Many
people with Down Syndrome also speak at a fast and
fluctuating rate [48] known as cluttering [16].
Speech Recognition to Support Speech Therapy
Each of the aforementioned developmental disabilities have
one general symptom in commonan articulation disorder.
For this reason, we focused on the development of a critical
speech recognition system capable of distinguishing between
correct and incorrect pronunciations. We discuss these
implementation details thoroughly in the Creating a Novel
Speech Mechanic section. There are many more technologies
that focus on improving speech skills such as fluency, pitch,
rhythm, dialect, and aphasia [3,6,31,31,46,49,52], but we do
not focus on these.
Serious Games for Health
Games can be used as effective educational interventions
[1,28]. Games have the ability to teach while providing a
motivating and interactive environment [50], and can be as
effective as face-to-face instruction [37]. They can create
opportunities to train new skills in a safe and engaging
environment, improving perceived self-efficacy, a key aspect
in health promotion interventions [22]
Multiple serious games for health are documented to be
effective for diverse platforms, health outcomes, and target
populations. Some examples that populate this space range
from an exergame to help blind children with balance [29] to
embodied persuasive games for adults in wheelchair [15] to
mobile games for motivating tobacco-free life in early
adolescence [34].
Games are such a powerful motivator that non-game
activities are often designed to look like games (i.e. gamified
systems). This attracts user attention and entices engagement
[11], which is particularly useful for tedious and otherwise
non-interesting activities [7]. By adding game design
elements to a pre-existing activity, designers manage to
engage people more with this activity [23]. However,
traditional gamification approaches have been widely
criticized [25]. Ethics apart, simply adding superficial game-
looking elements to an otherwise tedious activity does not
work in the long run. Both Nicholson and McGonigal [25,26]
have pointed out that extrinsic rewards like those typically
used in gamification can decrease intrinsic motivation and
engagement after the initial novelty effect. McGonigal
suggests a more fruitful approach by making the activity
intrinsically motivating and the rewards meaningful to the
player [25]. These relevant considerations prompted us to
study the types of rewards that interest our players.
Speech Therapy Applications
We present a comparison of some existing speech therapy
systems, shown in Figure 1. The survey of speech therapy
applications we present was compiled from interviews with
3 speech language pathologists, a literature review, and
current offerings on the App Store. The features we focus on
are whether the solutions run on mobile hardware and
whether or not they use speech recognition. We choose these
features to illustrate the gap SpokeIt is intended to fill.
Running on mobile hardware is important because of the
pervasiveness and convenience of mobile hardware [36]. A
critical speech recognition system is important for a speech
therapy game because it removes the requirement to be
facilitated by an often unreliable third party, opens up
opportunities to record fine-grained speech data, track
progress, and listen for both correct and incorrect
pronunciations.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:2
Of all the applications surveyed, we include Sayin’ it Sam,
the only other mobile game with an identical underlying
speech recognition library, to illustrate that similar
technology can serve very different purposes based on the
target population. We found no mobile games that use a
critical speech recognition system for articulation therapy.
We include non-mobile applications with speech recognition
systems because they offer novel speech therapy experiences
and features we wish were available in a mobile context. We
include a set of mobile speech therapy games without speech
recognition systems because they have many noteworthy
features and designswhich could be further improved if
they implemented a critical speech recognition system.
Figure 1: Venn Diagram depicting dichotomy between therapy
systems that use speech recognition and those that run on
mobile hardware
Sayin’ It Sam is the only other mobile application featuring
a speech recognition system identical to the library SpokeIt’s
critical recognition is built on. Sayinit Sam is primarily
focused on motivating non-verbal children to speak and is
therefore trained to be very forgiving. SpokeIt, however, is
built to listen critically to speech and to be used as an
articulation therapy tool.
Other researchers have integrated speech recognition into
non-mobile interactive game environments to improve
literacy. Project LISTEN is an automated reading tutor
aimed at helping children learn pronunciation and proper
speech when reading aloud. It does this by analyzing various
aspects such as pitch, speed, and pauses. Researchers have
tested the system in India for assisting children learning
English as a second language, as well as in Canada for
children looking to improve their speaking skills [31].
Project ALEX is a non-mobile application that has proposed
a very robust application for language learners of any age.
Project ALEX focuses on a large dictionary with text-to-
speech functionality. Most importantly, Project ALEX
included pronunciation practice and used speech recognition
to check if the user says the work correctly using Microsoft
SAPI [32]. Project ALEX is focused on studying the cultural
differences in speech and dialect.
Articulate it! is a unique multi-player mobile application
created by a SLP specifically to help children improve their
speech sound production. Articulate it! employs over 1000
images selected for working on English consonant sounds at
the word and phrase level and has the ability to store data for
multiple patients. Articulate It! has multiple game modes
such as a phonemes mode where the facilitator can select
target sounds and a mode where the facilitator can focus on
words with a specified number of syllables. Once a mode is
selected, the facilitator has the option to customize the
dictionary of target words by removing unwanted ones.
Modes can be switched without ending a session and speech
can be recorded for later comparisons and for the player to
listen back to their speech. This app requires a facilitator to
score speech.
Articulation Station [43] is a novel mobile speech therapy
app that allows SLPs to customize target sounds and sound
placement for patients of all ages to practice. For example,
SLPs can make the app focus on /k/ sounds that occur at the
beginning, middle, or end of a word, such as cat, pickle, or
tick. The app has three levels of difficulty where users must
say target words, sentences, or full stories. Like Articulate
it!, Articulation Station requires a SLP or facilitator. Here,
they grade speech, which is recorded in the app and available
for reporting. The app has pre-recorded samples of correct
speech for all target words.
Articulation Games is a comprehensive, flexible, and fun
speech-therapy app for iPads that was created by a certified
speech and language pathologist for children to practice the
pronunciation of over forty English phonemes, organized
according to placement of articulation. It includes thousands
of real-life flashcards, accompanied by professional audio
recordings and ability to record audio. Players practice
phonemes through activities like memory games and
flashcards. Articulation Games requires a facilitator to grade
speech. Auditory Workout, Articulation Vacation, and Real
Vocabulary all offer similar features and experiences as
Articulations Games, Articulation Station, and Articulate It!.
Many apps have more features than what is presented, such
as data collection capabilities for progress reports, the ability
to record player’s voices so they can listen back, and
challenges that progress from single sounds and words to
sentences and finally free natural speech. In the full release
of SpokeIt, we plan to include many of these features.
METHOD
We started our work by conducting 3 semi-structured
interviews with medical speech experts, which lead to the
creation of our functional requirements. We used these
functional requirements to inform the development of our
speech recognition system. Following an iterative user-
centered design process, we designed and implemented 3
prototypes that used this speech recognition system. Each of
the designs explored a unique game format and core speech
mechanic, which we detail and motivate in the following
These prototypes were developed in sequence, so design
knowledge carried over from one prototype to the next.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:3
We conducted a within-subject comparative study on the 3
prototypes with 5 adults with developmental disabilities co-
occurring with speech impairment. We were concerned with
participantsinterest in using a speech therapy game to
improve their speech, their opinion on each of the 3
prototypes, their preferred game structure, and relevant
reward systems they would be interested in.
We present here the main insights from this study, which
directly influenced the further development of SpokeIt. We
think they can be useful for others designing mobile games
for speech articulation therapy for a similar target group.
DESIGN
Creating a Novel Speech Mechanic
Our work began with semi-structured interviews with
medical speech experts. Researchers asked what technology
was currently used for speech therapy, what benefits and
drawbacks they see to using technology for speech therapy,
and what functionality must exist. The only technology our
experts used during speech therapy sessions where iPads
displaying images of speech targets for diagnostic purposes.
Experts suspected that their patients were practicing little to
none outside of the office, even though they recommended
10 minutes of practice per day. They were hopeful that a
speech therapy game would motivate their patients to
practice outside of the office. They stipulated that the system
must critically listen to pronunciation and were mildly
worried that a speech therapy game could condition bad
speech practices if the recognition was not accurate. The
inclusion of examples of correct speech mitigated thee
concerns. Finally, they expressed concerns that many of their
patients from lower socio-economic statuses may not have
access to the internet, so our game must be functional offline.
We budgeted for iPads in our grant so that these populations
can keep the iPads with the software installed after the
development and evaluation of SpokeIt has ended. We chose
iPads because they are categorized as medical devices by the
United States Government.
While there are many novel and interesting mobile
applications available that can improve the speech therapy
experience, we found none that provided a critical speech
recognition system in a game for on-the-go or at-home use.
Before developing any game prototypes, it was necessary to
ensure that it was feasible to listen for both correct and
incorrect speech. Many speech recognition systems exist
today such as those used in personal assistants like Cortana,
Siri, Google Assistant, and Amazon Alexa. We chose not to
use these services for multiple reasons:
We needed a solution that is offline because we are dealing
with sensitive speech data. In addition, not every home has
access to the internet. Also, online speech recognition
systems often have lag and usage caps that would hinder
real-time game play.
Digital assistants like the ones listed above are designed to
best guess speech, not listen to it critically. We needed to
be able to fine-tune the recognition to listen for incorrect
speech as well as correct utterances.
We did not want to discard the possibility of using and
recognizing non-existent words. Having the freedom to
play with silly nonsensical words is a tactic many SLPs use
to target specific sound production.
With these requirements in mind, we began searching online
for mobile speech recognition libraries that are highly
customizable and do not require an internet connection. The
library we chose is Pocketsphinx, an offline speech
recognition system for handheld devices from Carnegie
Mellon [19]. A speech therapy game must be able to listen to
speech critically so that the intervention will promote correct
speech. Pocketsphinx uses customizable dictionaries that
allow developers to customize the targets that can be
recognized [18] The dictionaries that Pocketsphinx employs
use ARPAbet, a set phonetic transcription codes, to map
speech sounds to English words [14]. ARPAbet can be used
to construct any sequence of phonetic sounds to a word
even words that do not exist. Any set of sounds that an
English speaker can produce can be mapped to an ARPAbet
representation. We can make new “words” that map to
common miss-pronunciations of correct words. Providing
both correct ARPAbet codes and ARPAbet codes that
represent miss-pronunciations give us the power to
distinguish between correct and incorrect speech. Table 1
shows ARPAbet codes that represent both correct and
incorrect ways to say the word balloon.
Common pronunciations of “Balloon” ARPAbet Code
Balloon
Walloon
Walloo
Bawoon
Balloo
Bawoo
Alloon
Loon
Table 1: Pronunciations of the word "Balloon" including
correct pronunciation (first) followed by common miss-
pronunciations and their corresponding ARPAbet Code
Pocketsphinx uses acoustic models to map sound data to
targets in the dictionary. These acoustic models are hot-
swappable and can be altered for better accuracy [18]. This
feature creates the potential to alter acoustic models for
specific populations, allowing a more accurate model that
can listen to adults with developmental disabilities, or even
one specifically for children with cleft speech. OpenEars is
a free open-source framework that brings the power of
Pocketsphinx to iOS devices in native objective-c language
for speed and reliability. RapidEars is a paid plugin for
OpenEars that gives Pocketsphinx the ability to listen to
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:4
speech in real-time, which is important for a responsive
game. The ability to customize acoustic models, to customize
dictionaries, to run offline, and to listen in real-time
motivated our choice to use RapidEars for our speech
therapy game prototypes.
Design and Development of prototypes
Our primary target population for the final release of SpokeIt
was children, and hence the 3 prototypes were designed with
children in mind. However, our primary IRB and physical
location limited our access to children with speech
impairments at the time. For accessibility reasons, we tested
our prototypes on adults with developmental disabilities co-
occurring with speech impairmentwho can also benefit
from the kinds of games we were creating. This could be seen
as a limitation of our study, yet as we will further discuss
later, the acceptance and enjoyment of our adult participants
towards our designs indicated they had good potential to
engage population older than our original target group.
To initiate the design process and after interviews with a
SLP, we drew inspiration from media that successfully elicits
speech from children, such as the popular children’s program
Dora the Explorer. The program inspired us because it
integrates learning and vocal participation in a storybook
style setting with intermittent learning challenges that
children love.
Speech Adventure
The first prototype that was developed was Speech
Adventure [42]. Speech Adventure, shown in Figure 2, is a
storybook style game that employs an off-screen narrator to
give directives on how to help Sam the Slug complete tasks.
Visual cues in the form of glowing blue outlines inform
players which parts of the scene can be interacted with and
touched. Once touched, the corresponding target phrase is
announced by the off-screen narrator.
To make progress in the game, the player must repeat the
target phrase that was just announced. The target words are
displayed at the bottom of the screen and are tightly tied to
the speech recognition system. Words turn green as they are
said correctly. To make progress, all words must be green.
Words can be said out of order and words that were missed
anywhere in the phrase can be repeated.
The green ear in the upper left corner of the screen indicates
that the system is actively listening. When the off-screen
narrator is speaking, the green ear turns white to signal the
system is not listening. We found it important to suspend
recognition during these moments so that the in-game audio
didn’t trigger game events.
The story of Speech Adventure starts with dressing Sam in
boots and a hat. Once Sam is dressed, the player must say
“Open the door” to journey outside. Once outside the player
must pop 3 balloons that are blocking a bridge by saying
“Pop a balloon” before continuing on the journey. These
phrases were provided by SLPs.
Figure 2: Speech Adventure cabin scene where player must
help Sam the Slug get dressed before going outside
A challenging design aspect of this game was that each target
phrase had to be carefully crafted to fit into the narrative of
the game. Development of new scenes that incorporated
target words proved to be very time consuming, which could
result in minimal content. We worried that Speech
Adventure would lose its novelty after a first play-through.
From conversations with the SLP, we realized re-playability
was an important feature for a game that should be played
for 10 minutes per day.
The majority of the allotted 10 minutes of daily practice
should be spent producing target words and practicing
speech. Yet, the narrative nature of our storybook-style
design limited the number of utterances that could be
produced in 10 minutes, which then caused concerns within
the design team about the therapeutic value of our game.
The nature of hand-crafted narratives limited our ability to
dynamically swap target words. According to a SLP we
interviewed, a speech therapy game would benefit from the
ability to customize targets dynamically based on the types
of speech therapy each individual player needs.
Although preliminary play tests indicated that players love
the storybook-style Speech Adventure game [40], the
considerations above prompted us to rethink how a speech
therapy game should be structured.
Speech with Sam
To maximize the number of target words produced in a 10-
minute period, we hypothesized that single-word utterances
that had an immediate effect on gameplay would yield more
speaking time. Removing narrative between utterances
reduces the amount of time that the recognition system is
suspended. This resulted in the development of Speech with
Sam, a series of speech controlled mini-games, shown in
Figure 3. The mini-game structure allowed quicker
development time, which yielded more content to play.
Because targets are not directly linked to a narrative,
implementing a diverse range of dynamic targets would be
more feasible.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:5
Figure 3: Speech with Sam rocket mini-game where rockets
are set off by saying the appropriate target
Speech with Sam played a series of mini-games ordered
randomly for a specified amount of time. In the rocket mini-
game example above, the player must tap one of the three
rockets. Touching a rocket reveals an in-game prompt that
specifies a rockets trigger word. Once the trigger word is
said, the rocket blasts off the screen and is replaced by a new
rocket with a new random trigger word.
For every rocket launched, the player score is increased by 1
and displayed in the green rounded rectangle in the upper-
left corner of the display. This score is recorded at the end of
every mini-game to keep a record of high scores and to track
trends about player scores.
The number in the blue rounded rectangle, next to the score
is a countdown timer until the next mini-game is played.
After the timer reaches 0, the score is recorded, and the
player is presented with a different randomly chosen mini-
game. All mini-games run for a standard amount of time. It
is possible for players to play the same mini-game in one
session, but not twice in a row.
The green ear in the upper-right corner of the screen works
identical to Speech Adventure in that when it is green, the
speech system is listening, and when white, the speech
system is suspended so that in-game audio does not trigger
events. The written prompt was moved to the top to join the
other heads up display elements. When players say the target
word, the text turns green and the game immediately
responds. In Speech Adventure, targets were often multi-
word phrases, whereas in Speech with Sam, the targets are
single words.
In a preliminary study [40], we found that Speech with Sam
was successful in increasing the words per minute from
participants, meaning Speech with Sam has the potential to
be a more effective speech therapy solution. In that same
study, we found that participants were unenthusiastic when
presented with a mini-game they had played before. Our
hypothesis about mini-game re-playability being high was
not necessarily true.
SpokeIt
SpokeIt, shown in Figure 4, is storybook-style mini-game
hybrid. By adding a story around the mini-games that fit into
an overarching plot, we have the potential to both produce
high output of words per minute and keep users engaged.
Figure 4: SpokeIt card coloring mini-game
SpokeIt is the first speech therapy game to both demonstrate
correct pronunciation with pre-recorded audio and lip
animations. One of the medical experts we interviewed later
suggested that showing correct speech is important as well
as hearing correct speech. We did not want to break
immersion by displaying a realistic mouth in an animated 2D
environment. We found a solution that allowed us to sync the
audio that exemplified correct speech with animated mouth
transitions on our characters. The lip animation effects were
achieved using Adobe Character Animator. Each phoneme
mouth shape was crafted in Adobe Photoshop with three
frames per transition, resulting in smooth transitions between
mouth shapes. Adobe Character Animator’s Lip Sync feature
and our frame transitions automatically map our voice
actor’s speech performances to the appropriate mouth
shapes. The motivation behind this work is to give players
visual cues on how a word is said. Adobe Character
Animator’s abilities to synchronize lip animations and
replicate actor’s facial expressions is shown in Figure 5.
The SpokeIt prototype, shown in Figure 4, is meant to blend
the positive aspects of the Speech Adventure and Speech
with Sam experiences. Therefore, the beginning of the game
starts with a narration from Sam the Slug and their desires to
go and visit a close friend. To get to the friend, the player
must partake on a long journey filled with new experiences
and challenges. In the example above, Sam meets a friendly
creature named Red who is struggling to learn colors that
start with the letter “B”or in the future, colors that start
with any letter that player needs to practice. Sam knows the
player is great with colors and asks them to teach Red colors
that start with “B” by saying the color on numbered cards.
Unlike Speech with Sam and Speech Adventure, an on-
screen character narrates the game. SpokeIt is the first
prototype to include completely animated characters with
mouth transitions. Unlike Speech with Sam, but similar to
Speech Adventure, SpokeIt demonstrates how each target
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:6
should be pronounced. SpokeIt is the only prototype that
automatically moves on if a player is struggling for over ten
seconds.
Figure 5. Top: Character demonstrating /V/ sound, 2nd from
Top: Character demonstrating /L/ Sound, Bottom: Characters
showing sad and disgusted expressions
Instead of words lighting up green as they are spoken
correctly, SpokeIt uses a “Heard” element that displays
exactly what speech was recognizedcorrect
pronunciations and incorrect pronunciations. The target
word or phrase is displayed in the upper-left corner of the
screen. SpokeIt has word, phrase, and sentence targets. The
ear in the upper-right is crossed out when the system pauses.
Unlike both predecessors, SpokeIt is completely touch-free.
To simplify game directives and required interactions, the
only form of input that makes progress in the game is speech.
We use the touch-screen to aid players. When an element is
tapped, Sam says that word aloud to help players know their
targets, which is important to players who cannot read.
Design Overview
Table 2 summarizes some key differences between the three
prototypes we developed, namely the game structure style
and how the players are prompted on game targets. We
hypothesized players would enjoy the hybrid mini-game
storybook style SpokeIt provides because it includes fast
paced game play surrounded by narrative. We also
hypothesized a main character who demonstrates speech
with mouth animations would increase usability and
therapeutic value.
Prototype
Game Style
Instruction
Speech Adventure
Storybook
Off-Screen Narrator
Speech with Sam
Minigames
Text prompts
SpokeIt
Hybrid
Main character
Table 2: Key characteristics of each prototype
STUDY
After development of the third prototype had been
completed, researchers wanted to ensure progress on the
design was moving in the correct direction. We wanted to
ensure our game was usable and learn what future features or
rewards would keep players engaged.
Protocol
We began by administering a preliminary survey to collect
demographics, interest in speech therapy games, and general
game use.
We then conducted a within-subject comparative study
where each participant played each of the three prototypes in
a random order. Researchers were present to facilitate,
answer questions, and change prototypes when necessary.
Following, a post interview was conducted, where we
collected rankings of each of the designs, usability feedback,
and core mechanic feedback, asking a mixed set of targeted
questions to explore positive and negative characteristics
from each of the three prototypes. We also discussed the
kinds of reward systems our participants would be interested
in. Each participant was asked the same set of questions.
Facilitators wrote down answers to each of these questions
and also jotted down any quotes or observations they had.
Our study was video recorded.
We concluded the study with a 3-question 5-point Likert
survey to receive feedback on how well-received our speech
recognition system and speech mechanics were. We were
interested in its perceived accuracy, responsiveness, and
mechanic. Participants were asked if 1) the game accurately
heard them, 2) The game responded quickly to speech, and
3) They would want to play at home.
Participants
Our research lab has an on-going relationship with a local
day program for adults with developmental disabilities. We
asked the program staff to provide us with facilities to
conduct our study and to provide us with participants with
speech impairments who are legally able to provide consent.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:7
Regarding the demographic information collected, all
individuals who attend this day program are adults. Two
participants were not comfortable sharing their age but
seemed to be similar in age to the other participants. Table 3
below outlines basic demographic information of our 5
participants. One participant had Cerebral Palsy, one had
Down Syndrome, one had ASD, and two were diagnosed
with mental retardation co-occurring with articulation
disorders.
Participant
Age
Sex
Game Play Frequency
P1
27
Male
5 Days/Week
P2
31
Female
2 Days/Week
P3
24
Male
7 Days/Week
P4
Male
7 Days/Week
P5
Male
2 Days/Week
Table 3: Participant Demographics
Facility
We were allowed to use two medium rooms. Because neither
of the two rooms was big enough to accommodate the entire
group and we had to work within the daily schedule, we split
participants between the two rooms to run the study in
parallel with all participants. The situation was not ideal, but
to remove as much bias as possible, we asked each question
to each individual in a random ordermeaning P3 might
answer question 1 first, then P2, but P1 answered question 2
first followed by P3. Each participant had the opportunity to
answer each question before the group moved on to the next
question. The facility was also not ideal for the within-
subject play of the games. The speech recognition works best
in a quiet environment, but this was not possible given the
constraints of our facility.
Equipment
We brought enough iPads for each participant and a few
extra in case of technical problems, two laptops with
webcams to record each of the rooms, surveys, scripts,
consent forms, and note-taking materials.
RESULTS
We use handwritten notes from researchers containing
participant’s responses to questions about relevant reward
systems and opinions on each of the 3 prototypes. We use the
results of our 3-question Likert survey and participant quotes
about the speech recognition to report insights about its use.
We use video recordings of the participants playing our 3
game prototypes to identify usability issues, and player
reactions.
For our analysis, two researchers created codes and themes
while three independent coders analyzed all videos for our
qualitative analysis. We use BORIS to analyze the videos
using our codes and themes. Three researchers independently
analyzed all videos using our BORIS file. The emerging
insights include preferred game styles, reward systems, and
usability concerns.
All participants play games two or more times per week and
would be interested in using games to improve their speech.
Participants report that the games they play most commonly
are car games, racing games, solitaire, bowling, and NFL
sports games. Participants report they have difficulty
speaking loudly, have fluency difficulties, and are sometimes
unsure of what to say when speaking.
In the following sections, we organize our findings by the
specific prototype they relate to. We then report general
insights into reward systems our participants are interested
in and their experience using the speech mechanics.
Speech with Sam
Researchers observed that two participants were laughing
while playing Speech with Sam. This may be because of
humorous phrases that are present in the narrative such as,
“Slugs don’t wear boots!”
Due to software updates, some features of the game became
unresponsive, which understandably frustrated many of our
players. It was not always obvious to the players that they
needed to touch flashing objects to progress in the game. All
instructions in the game were displayed as text, but many of
our participants could not read, so researches aided players
by dictating the instructions. Players wanted better feedback
when interacting because at times, they were unsure if the
game accepted or rejected their responses. They were also
unsure when they had to repeat themselves. Participants
enjoyed the pace of Speech with Sam.
Speech Adventure
The mini-games in speech adventure, particularly in the
rocket scene, gave immediate feedback when the player
correctly interacted with the game. Many player’s visceral
reactions to the sounds and animations were very positive.
They enjoyed that fireworks celebrated their success with a
satisfying pop sound. Speech Adventure kept track of player
scores and displayed them to the participants. Two players
reported that seeing their score was satisfying and marked
their progress.
Players who could not read also struggled with this prototype
because the instructions were written out as text and needed
the researcher’s help to navigate through the game
objectives. Many of the mini-games rely on the players to
speak in the correct rhythm, but this was extremely
challenging for some. The scenes automatically progress
after an allotted amount of time. Players found this happened
too fastjust as they were beginning to understand the
objectives and mechanics, the next game would be displayed.
Some of the game objectives were too complicated and
several players never learned how to complete objectives in
the allotted time. In general, Speech Adventure needs to be
slowed down drastically and the instructions need to be
clearer.
SpokeIt
Three out of five participants preferred SpokeIt to the rest of
the prototypes. They especially appreciated that all the
instructions were spoken allowed by the main character and
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:8
displayed as text on the screen. Participants found the
interaction objectives much simpler because SpokeIt never
requires a player to touch the screen. Many participants
found SpokeIt to be incredibly aesthetically pleasing. They
loved the colors, graphics, and animations. Participants
preferred the highly animated main character in SpokeIt
because it represented a more responsive and lively element.
One participant was very interested in bringing SpokeIt
home with her.
One participant commented that he would like SpokeIt to
repeat the instructions because he did not always remember
what he was supposed to say. Most users seemed most
enthusiastic about playing SpokeIt again, indicating the
hybrid structure may improve re-playability of mini-games
because they fit within an overall narrative.
Speech Recognition and Mechanics
We report that users are neutral about the accuracy of our
speech recognition system (Q1), but found it responded
quickly (Q2). They found the speech mechanic was
rewarding and enjoyable, and they think the games are
suitable systems to promote practicing speech at home (Q3).
Rewards
We are interested in rewarding players for practicing speech
in a meaningful way to them. Hence, we brainstormed a few
ideas with our participants and asked them to vote on which
reward would be most interesting to them. Our ideas
included:
Hats and clothes to accessorize a character or avatar after
completing sessions
Using scored points to spend in virtual store. Points could
be spent to buy items for a virtual garden or furnishing the
character’s house.
Reducing total time needed to practice speech in the future.
If a player does really well in a 10-minute session, then
tomorrow they only need to play for 8 minutes.
Out-of-game rewards (Stickers, candy, other physical
reward)
Overwhelmingly, our participants were interested in out-of-
game rewards. They were extremely excited about the idea
of receiving candy when they do well in the game.
Usability Considerations
Watching players use our systems was very informative. We
identify 3 main issues that must be addressed in speech
therapy systems:
Many participants cannot read: the game must be very
clear and be designed around to accommodate players that
cannot read. Objectives should be spoken aloud and be
repeated if necessary. If participants are struggling, the
game should either change the objectives or move forward
with the plot.
To make progress in the game, players must use their
speech. Touch should be used to support players, provide
clues, or demonstrate correct pronunciation. These
mechanics should not be mixed.
More feedback for correct and incorrect interactions must
be given to make progress clear. If the player pronounces
a target incorrectly, the game should support that player in
saying it correctly.
Many users put the iPads to their heads because they had
trouble hearingthe game volume must be louder, or
headphones must be provided, especially in noisy
environments.
Structure
We found that players prefer the hybrid structure because
mini-games were given context in an overarching plot. Mini-
games that are played out of a narrative context seem to lose
their novelty as soon as they are repeated. Our users seemed
to care a lot about aesthetics and animations indicating that
high levels of polish are important.
Storybook-style games require much more work to generate
content and narrative consistency. Speech therapy games
should be available and fresh for as long as the individual
needs to practice speech. Narrative content that surrounds
mini-games is an effective balance of development time, re-
playability, and diversity of speech targets.
DISCUSSION
Speech Recognition Accuracy
In the future, we would like to explore improving the
accuracy of the speech recognition system. This is obviously
an important feature of a speech therapy game, especially for
when we explore the clinical validity of the game. There are
two controllable factors that we considerthe physical
hardware, and the acoustic models. We would like to
compare the accuracy of the system when using a noise
cancelling microphone and when using the built-in
microphone. We would also like to collect speech data from
our specific target populations to train the acoustic model
and compare it to the default model. Collecting speech data
from vulnerable populations is a challenge due to HIPPA
regulations in the United States. In this context, speech is
Likert Results
Q1
Q2
Q3
Values 4 4 4
4
4
4
5 4 5
2
4
4
2 4 4
Average 3.2 4 4.2
Table 2: Speech Recognition and Speech Mechanics Likert
Survey Results
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:9
categorized as medical data and must be approved before it
can be stored safely. Another challenge is the tedious process
of hand-coding the speech data in a way that allows acoustic
models to be trained using machine learning techniques.
Rewards
Our brainstorm session about preferred reward systems
indicated that our users were not interested in badges, stars,
and pointsthey wanted tangible real-world rewards and
commemoration from their mentors. We do not want to
simply gamify [25] speech therapy with banners, badges,
ribbons, and stars. In the future, we are interested in creating
a rewarding experience that integrates into current practices.
Our serious game for health should be one component of a
broader holistic therapy experience. We envision the final
product to have a companion app for speech therapists to
assign individualized curriculum goals and receive reports of
patient progress. If patients meet these goals, an SLP could
reward patients with tangible prizes.
Working with Children with Speech Impairments
As previously stated in the Design section, SpokeIt is
intended for children with speech impairments, but regular
access to this population is difficult, so out of convenience,
we conducted our study with adults with developmental
disabilities co-occurring with speech impairment. Using a
childish design on an adult population is a limitation of this
study, but our participants truly enjoyed the experiences and
never indicated that they felt patronized. Also, it is worth
reiterating that this adult population has the potential to
benefit from this work, as they also have speech goals and
articulation disorders.
Speech and language development is important for
children’s future ability to live independently and to
participate fully in society [20]. In 2012, nearly 8% of
children aged 3-17 in the United States had a communication
disorder and younger children, boys, and non-Hispanic white
children were more likely than other children to receive an
intervention service for their disorder [53]. Children with
speech impairments such as Cleft Lip Palate have high risks
of behavioral problems and increased symptoms of
depression [20]. They show more deficits in social and
academic competencies, score higher for social problems
[13], and are more likely to be teased in social settings [20].
Even those who undergo a corrective surgery tend to display
a delay in scholarship, have a lower income, marry later in
life and become independent from their parents significantly
later [21]. Clearly, further exploration of using SpokeIt to
help children improve their speech is worth exploring.
Methodological considerations
Our users struggled with Likert style questions, so we needed
to adapt how we conducted them onsite, which we detail
here. This can serve to other researchers working with a
similar target group (adults with developmental disabilities).
We first asked whether they agreed with the statement,
disagreed with the statement, or did not know. If they did not
know, we marked down a 3. If they said they agreed, we
asked if they agreed a lot or a little. If they said they agreed
a lot, we put a 5. If they said they agreed a little, we marked
down a 4. We followed the same process if they disagreed.
If facilitators felt a participant was answering just to please
us, we would ask the same questions in the opposite way and
remind users that we want them to be authentic. Some
participants changed their answers, which lead us to believe
our results may not be completely representative of our
population. Asking Likert style questions in this way was
cumbersome and may result in data that does not represent
the population.
ACKNOWLEDGMENTS
This material is based in part upon work supported by the
National Science Foundation under Grant number #1617253.
We also thank Doctor Travis Tollefson and SLP Christina
Roth for their aid in conducting user evaluations. Any
opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science
Foundation.
REFERENCES
1. Clark C Abt. 1987. Serious games. University press of
America.
2. Frank R Adams, Hubert Crepy, David Jameson, and J
Thatcher. 1989. IBM products for persons with
disabilities. In Global Telecommunications Conference
and Exhibition’Communications Technology for the
1990s and Beyond’(GLOBECOM), 1989. IEEE, 980
984.
3. Olle Bälter, Olov Engwall, Anne-Marie Öster, and
Hedvig Kjellström. 2005. Wizard-of-Oz test of
ARTUR: a computer-based speech training system with
articulation correction. In Proceedings of the 7th
international ACM SIGACCESS conference on
Computers and accessibility, 3643.
4. Elizabeth Boyle, Thomas M Connolly, and Thomas
Hainey. 2011. The role of psychology in understanding
the impact of computer games. Entertainment
Computing 2, 2: 6974.
5. EB Brownlie, Joseph H Beitchman, Michael Escobar,
Arlene Young, Leslie Atkinson, Carla Johnson, Beth
Wilson, and Lori Douglas. 2004. Early language
impairment and young adult delinquent and aggressive
behavior. Journal of abnormal child psychology 32, 4:
453467.
6. H Timothy Bunnell, Debra M Yarrington, and James B
Polikoff. 2000. STAR: articulation training for young
children. In Sixth International Conference on Spoken
Language Processing.
7. Biran Burke. 2016. Gamify: How gamification
motivates people to do extraordinary things. Routledge.
8. Kerstin Carlstedt, Gunilla Henningsson, and Göran
Dahllöf. 2003. A four-year longitudinal study of palatal
plate therapy in children with Down syndrome: effects
on oral motor function, articulation and communication
preferences. Acta Odontologica Scandinavica 61, 1:
3946.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:10
9. Mary Clement and Thomas E Twitchell. 1959.
Dysarthria in cerebral palsy. Journal of Speech and
Hearing Disorders 24, 2: 118–122.
10. Colette Coleman and Lawrence Meyers. 1991.
Computer recognition of the speech of adults with
cerebral palsy and dysarthria. Augmentative and
Alternative Communication 7, 1: 3442.
11. Sebastian Deterding, Staffan L. Björk, Lennart E.
Nacke, Dan Dixon, and Elizabeth Lawley. 2013.
Designing gamification: creating gameful and playful
experiences. In CHI’13 Extended Abstracts on Human
Factors in Computing Systems, 32633266.
12. Jared Duval, Zachary Rubin, Elizabeth Goldman, Nick
Antrilli, Yu Zhang, Su-Hua Wang, and Sri Kurniawan.
2017. Designing Towards Maximum Motivation and
Engagement in an Interactive Speech Therapy Game. In
Proceedings of the 2017 Conference on Interaction
Design and Children, 589594.
13. Kristin Billaud Feragen, Ingela L Kvalem, Nichola
Rumsey, and Anne IH Borge. 2010. Adolescents with
and without a facial difference: the role of friendships
and social acceptance in perceptions of appearance and
emotional resilience. Body Image 7, 4: 271279.
14. Javier Franco-Pedroso and Joaquin Gonzalez-
Rodriguez. 2016. Linguistically-constrained formant-
based i-vectors for automatic speaker recognition.
Speech Communication 76: 6181.
https://doi.org/10.1016/j.specom.2015.11.002
15. Kathrin Maria Gerling, Regan L Mandryk, Max
Valentin Birk, Matthew Miller, and Rita Orji. 2014. The
effects of embodied persuasive games on player
attitudes toward people using wheelchairs. In
Proceedings of the 32nd annual ACM conference on
Human factors in computing systems, 34133422.
16. Donna M Hanson, Alfred W Jackson, Randi J
Hagerman, John M Opitz, and James F Reynolds. 1986.
Speech disturbances (cluttering) in mildly impaired
males with the Martin-Bell/fragile X syndrome.
American Journal of Medical Genetics Part A 23, 12:
195206.
17. Patricia Howlin, Lynn Mawhood, and Michael Rutter.
2000. Autism and developmental receptive language
disorderA follow-up comparison in early adult life.
II: Social, behavioural, and psychiatric outcomes. The
Journal of Child Psychology and Psychiatry and Allied
Disciplines 41, 5: 561–578.
18. D. Huggins-Daines, M. Kumar, A. Chan, A. W. Black,
M. Ravishankar, and A. I. Rudnicky. 2006.
Pocketsphinx: A Free, Real-Time Continuous Speech
Recognition System for Hand-Held Devices. In 2006
IEEE International Conference on Acoustics Speech
and Signal Processing Proceedings, II.
https://doi.org/10.1109/ICASSP.2006.1659988
19. David Huggins-Daines, Mohit Kumar, Arthur Chan,
Alan W Black, Mosur Ravishankar, and Alexander I
Rudnicky. 2006. Pocketsphinx: A free, real-time
continuous speech recognition system for hand-held
devices. In Acoustics, Speech and Signal Processing,
2006. ICASSP 2006 Proceedings. 2006 IEEE
International Conference on, I—-I.
20. Dr Orlagh Hunt, Dr Donald Burden, Dr Peter Hepper,
Dr Mike Stevenson, and Dr Chris Johnston. 2007.
Parent Reports of the Psychosocial Functioning of
Children with Cleft Lip and/or Palate. The Cleft Palate-
Craniofacial Journal 44, 3: 304311.
https://doi.org/10.1597/05-205
21. Mercy Larnyoh. 2015. Determining social challenges of
children with cleft lip and or palate as perceived by
parents or caretakers at Komfo Anokye Teaching
Hospital in Kumasi Metropolis in Ashanti region,
Ghana.
22. Debra A Lieberman. 1997. Interactive video games for
health promotion: Effects on knowledge, self-efficacy,
social support, and health. Health promotion and
interactive technology: Theoretical applications and
future directions: 103120.
23. Elena Márquez Segura, Annika Waern, Luis Márquez
Segura, and David López Recio. 2016. Playification:
The PhySeEar case. 376388.
https://doi.org/10.1145/2967934.2968099
24. Robert C Marshall, Robert T Wertz, David G Weiss,
James L Aten, Robert H Brookshire, Luis Garcia-
Bunuel, Audrey L Holland, John F Kurtzke, Leonard L
LaPointe, Franklin J Milianti, and others. 1989. Home
treatment for aphasic patients by trained
nonprofessionals. Journal of Speech and Hearing
Disorders 54, 3: 462470.
25. Jane McGonigal. 2011. We Don’t Need No Stinkin’
Badges: How to Re-invent Reality Without
Gamification. Retrieved December 18, 2015 from
http://www.gdcvault.com/play/1014576/We-Don-t-
Need-No
26. Jane McGonigal. 2011. Reality is broken: Why games
make us better and how they can change the world.
Penguin.
27. Ray M Merrill, Nelson Roy, and Jessica Lowe. 2013.
Voice-related symptoms and their effects on quality of
life. Annals of Otology, Rhinology & Laryngology 122,
6: 404411.
28. David R Michael and Sandra L Chen. 2005. Serious
games: Games that educate, train, and inform. Muska
& Lipman/Premier-Trade.
29. Tony Morelli, Lauren Lieberman, John Foley, and
Eelke Folmer. 2014. An exergame to improve balance
in children who are blind. In FDG.
30. Megan A Morris, Sarah K Meier, Joan M Griffin,
Megan E Branda, and Sean M Phelan. 2016. Prevalence
and etiologies of adult communication disabilities in the
United States: Results from the 2012 National Health
Interview Survey. Disability and health journal 9, 1:
140144.
31. Jack Mostow and others. 2001. Evaluating tutors that
listen: An overview of Project LISTEN. In Smart
machines in education, 169234.
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:11
32. Cosmin Munteanu, Joanna Lumsden, Hélène Fournier,
Rock Leung, Danny D’Amours, Daniel McDonald, and
Julie Maitland. 2010. ALEX: mobile language assistant
for low-literacy adults. In Proceedings of the 12th
international conference on Human computer
interaction with mobile devices and services, 427430.
33. Joan Murphy. 2006. Perceptions of communication
between people with communication disability and
general practice staff. Health Expectations 9, 1: 4959.
34. Heidi Parisod, Anni Pakarinen, Anna Axelin, Riitta
Danielsson-Ojala, Jouni Smed, and Sanna Salanterä.
2017. Designing a Health-Game Intervention
Supporting Health Literacy and a Tobacco-Free Life in
Early Adolescence. Games for Health Journal.
35. Larry J Platt, Gavin Andrews, Margrette Young, and
Peter T Quinn. 1980. Dysarthria of adult cerebral palsy:
I. Intelligibility and articulatory impairment. Journal of
Speech, Language, and Hearing Research 23, 1: 2840.
36. Alessandra Preziosa, Alessandra Grassi, Andrea
Gaggioli, and Giuseppe Riva. 2009. Therapeutic
applications of the mobile phone. British Journal of
Guidance & Counselling 37, 3: 313325.
37. Josephine M Randel, Barbara A Morris, C Douglas
Wetzel, and Betty V Whitehill. 1992. The effectiveness
of games for educational purposes: A review of recent
research. Simulation & gaming 23, 3: 261276.
38. William M Reynolds and Susan Reynolds. 1979.
Prevalence of speech and hearing impairment of
noninstitutionalized mentally retarded adults. American
journal of mental deficiency.
39. John C Rosenbek, Margaret L Lemme, Margery B
Ahern, Elizabeth H Harris, and Robert T Wertz. 1973.
A treatment for apraxia of speech in adults. Journal of
Speech and Hearing Disorders 38, 4: 462472.
40. Zachary Rubin. 2017. Development and evaluation of
software tools for speech therapy. University of
California, Santa Cruz.
41. Zachary Rubin, Sri Kurniawan, and Travis Tollefson.
2014. Results from using automatic speech recognition
in cleft speech therapy with children. In International
Conference on Computers for Handicapped Persons,
283286.
42. Zak Rubin and Sri Kurniawan. 2013. Speech
Adventure: Using Speech Recognition for Cleft Speech
Therapy. In Proceedings of the 6th International
Conference on PErvasive Technologies Related to
Assistive Environments (PETRA ’13), 35:135:4.
https://doi.org/10.1145/2504335.2504373
43. Ellen Sciuto. 2013. The iPad: Using new technology for
teaching reading, language, and speech for children with
hearing loss.
44. Lawrence D Shriberg, Rhea Paul, Jane L McSweeny,
Ami Klin, Donald J Cohen, and Fred R Volkmar. 2001.
Speech and prosody characteristics of adolescents and
adults with high-functioning autism and Asperger
syndrome. Journal of Speech, Language, and Hearing
Research 44, 5: 10971115.
45. Ann Bosma Smit. 2004. Articulation and phonology
resource guide for school-age children and adults.
Cengage Learning.
46. Ali JA Soleymani, Martin J McCutcheon, and MH
Southwood. 1997. Design of speech illumina mentor
(SIM) for teaching speech to the hearing impaired. In
Biomedical Engineering Conference, 1997.,
Proceedings of the 1997 Sixteenth Southern, 425428.
47. Jonathan M Sykes and Travis T Tollefson. 2005.
Management of the cleft lip deformity. Facial plastic
surgery clinics of North America 13, 1: 157167.
48. John Van Borsel and An Vandermeulen. 2008.
Cluttering in Down syndrome. Folia Phoniatrica et
Logopaedica 60, 6: 312–317.
49. Klara Vicsi, Peter Roach, A Öster, Zdravko Kacic, Peter
Barczikay, Andras Tantos, Ferenc Csatári, Zs Bakcsi,
and Anna Sfakianaki. 2000. A multimedia, multilingual
teaching and training system for children with speech
disorders. International Journal of speech technology 3,
3–4: 289–300.
50. Maria Virvou, George Katsionis, and Konstantinos
Manos. 2005. Combining software games with
education: Evaluation of its educational effectiveness.
Educational Technology & Society 8, 2: 5465.
51. Laurie A Vismara, Costanza Colombi, and Sally J
Rogers. 2009. Can one hour per week of therapy lead to
lasting changes in young children with autism? Autism
13, 1: 93115.
52. Charles S Watson, Daniel J Reed, Diane Kewley-Port,
and Daniel Maki. 1989. The Indiana Speech Training
Aid (ISTRA) I: Comparisons between human and
computer-based evaluation of speech quality. Journal of
Speech, Language, and Hearing Research 32, 2: 245
251.
53. 2015. National Center for Health Statistics. Centers for
Disease Control and Prevention. Retrieved from
https://www.cdc.gov/nchs/products/databriefs/db205.ht
m
54. 2016. Statistics on Voice, Speech, and Language. U.S.
Department of Health and Human Services. Retrieved
from
https://www.nidcd.nih.gov/health/statistics/statistics-
voice-speech-and-language#5
55. Quick Statistics About Voice, Speech, Language.
National Institute on Deafness and Other
Communication Disorders. Retrieved from
https://www.nidcd.nih.gov/health/statistics/quick-
statistics-voice-speech-language
Accessibility and Mobile Health
MobileHCI'18, September 3-6, Barcelona, Spain
50:12
... In addressing such speech impairments, Speech-Language Pathologists (SLPs) play a significant role in the screening, assessment, diagnosis, and treatment of persons with SSD. Personalized speech therapy and practice monitored by SLPs can improve the acquisition of speech skills [7]. However, the accessibility of SLPs is crucial for such intervention. ...
... However, the accessibility of SLPs is crucial for such intervention. A report suggests that up to 70 % of SLPs have waiting lists, which indicates a shortage in the workforce [7,8]. Furthermore, according to United Nations Children's Fund (UNICEF), there are not adequate speech-language therapy services for children with communication disorders and disabilities [9]. ...
... Researchers have also specifically worked and devised AI-based tools for persons with hearing impairment [23,24]. A novel tongue-based Human-Computer interaction tool [25] and gamified AI-based tool [7] for persons with motor speech disorder have been proposed. [28,29]. ...
Preprint
Full-text available
This paper presents a systematic literature review of published studies on AI-based automated speech therapy tools for persons with speech sound disorders (SSD). The COVID-19 pandemic has initiated the requirement for automated speech therapy tools for persons with SSD making speech therapy accessible and affordable. However, there are no guidelines for designing such automated tools and their required degree of automation compared to human experts. In this systematic review, we followed the PRISMA framework to address four research questions: 1) what types of SSD do AI-based automated speech therapy tools address, 2) what is the level of autonomy achieved by such tools, 3) what are the different modes of intervention, and 4) how effective are such tools in comparison with human experts. An extensive search was conducted on digital libraries to find research papers relevant to our study from 2007 to 2022. The results show that AI-based automated speech therapy tools for persons with SSD are increasingly gaining attention among researchers. Articulation disorders were the most frequently addressed SSD based on the reviewed papers. Further, our analysis shows that most researchers proposed fully automated tools without considering the role of other stakeholders. Our review indicates that mobile-based and gamified applications were the most frequent mode of intervention. The results further show that only a few studies compared the effectiveness of such tools compared to expert Speech-Language Pathologists (SLP). Our paper presents the state-of- the-art in the field, contributes significant insights based on the research questions, and provides suggestions for future research directions.
... Ambient and environmental noise that affected the game performance [30, 32, 33, 49] 3 Contradiction between game levels and the needs of target groups (the game was very difficult or too easy) [14, 21, 52] 4 e game was challenging because it required two hands to play [14, 21] 5 Children could not easily read words or phrases due to inadequate instruction [14, 31] 6 Not all participants were willing to wear the headset microphone [14, 42] 7 Delays in speech recognition [40, 43] 8 e game did not recognize low tune voices, and children had to speak loudly [30, 31] 9 e designed game did not provide feedback on accepting or rejecting children's voices [31] 10 One of the challenges at design phase was that each target phrase or word had to be carefully crafted to fit into the narrative of the game and this was very time-consuming, which could result in minimal content [31] 11 ...
... Ambient and environmental noise that affected the game performance [30, 32, 33, 49] 3 Contradiction between game levels and the needs of target groups (the game was very difficult or too easy) [14, 21, 52] 4 e game was challenging because it required two hands to play [14, 21] 5 Children could not easily read words or phrases due to inadequate instruction [14, 31] 6 Not all participants were willing to wear the headset microphone [14, 42] 7 Delays in speech recognition [40, 43] 8 e game did not recognize low tune voices, and children had to speak loudly [30, 31] 9 e designed game did not provide feedback on accepting or rejecting children's voices [31] 10 One of the challenges at design phase was that each target phrase or word had to be carefully crafted to fit into the narrative of the game and this was very time-consuming, which could result in minimal content [31] 11 ...
Article
Full-text available
Introduction: Treatment of speech disorders during childhood is essential. Many technologies can help speech and language pathologists (SLPs) to practice speech skills, one of which is digital games. This study aimed to systematically investigate the games developed to treat speech disorders and their challenges in children. Methods: A comprehensive search was conducted in four databases, including Medline (through PubMed), Scopus, Web of Science, and IEEE Xplore, to retrieve English articles published by July 14, 2021. The articles in which a digital game was developed to treat speech disorders in children were included in the study. Then, the features of the designed games and their challenges were extracted from the studies. Results: After reviewing the full texts of 69 articles and assessing them in terms of inclusion and exclusion criteria, 27 articles were included in the systematic review. In these articles, 59.25% of the games had been developed in English language and children with hearing impairments had received much attention from researchers compared to other patients. Also, the Mel-Frequency Cepstral Coefficients (MFCC) algorithm and the PocketSphinx speech recognition engine had been used more than any other speech recognition algorithm and tool. In terms of the games, 48.15% had been designed in a way that children could practice with the help of their parents. The evaluation of games showed a positive effect on children's satisfaction, motivation, and attention during speech therapy exercises. The biggest barriers and challenges mentioned in the studies included sense of frustration, low self-esteem after several failures in playing games, environmental noise, contradiction between games levels and the target group's needs, and problems related to speech recognition. Conclusion: The results of this study showed that the games positively affect children's motivation to continue speech therapy, and they can also be used as the SLPs' aids. Before designing these tools, the obstacles and challenges should be considered, and also, the solutions should be suggested.
... The design and implementation of mobile apps for use by children with communication disorders is a research area that draws attention from both clinical researchers and human-computer interaction researchers. In recent years, human-computer interaction scholars have designed apps for children with autism, cleft palate, speech sound disorders, cochlear implants, and other communication disorders [8][9][10][11][12]. Given the variety of needs among children with communication disorders, developers and designers may encounter difficulties obtaining verbal or written user feedback on app content and features while creating and revising these apps; consequently, they must rely on reports from key stakeholders that surround the circle of care of children with communication disorders [13,14]. ...
Article
Full-text available
Abstract Background: With the plethora of mobile apps available on the Apple App Store, more speech-language pathologists (SLPs) have adopted apps for speech-language therapy services, especially for pediatric clients. App Store reviews are publicly available data sources that can not only create avenues for communication between technology developers and consumers but also enable stakeholders such as parents and clinicians to share their opinions and view opinions about the app content and quality based on user experiences. Objective: This study examines the Apple App Store reviews from multiple key stakeholders (eg, parents, educators, and SLPs) to identify and understand user needs and challenges of using speech-language therapy apps (including augmentative and alternative communication [AAC] apps) for pediatric clients who receive speech-language therapy services. Methods: We selected 16 apps from a prior interview study with SLPs that covered multiple American Speech-Language-Hearing Association Big Nine competencies, including articulation, receptive and expressive language, fluency, voice, social communication, and communication modalities. Using an automatic Python (Python Software Foundation) crawler developed by our research team and a Really Simple Syndication feed generator provided by Apple, we extracted a total of 721 app reviews from 2009 to 2020. Using qualitative coding to identify emerging themes, we conducted a content analysis of 57.9% (418/721) reviews and synthesized user feedback related to app features and content, usability issues, recommendations for improvement, and multiple influential factors related to app design and use. Results: Our analyses revealed that key stakeholders such as family members, educators, and individuals with communication disorders have used App Store reviews as a platform to share their experiences with AAC and speech-language apps. User reviews for AAC apps were primarily written by parents who indicated that AAC apps consistently exhibited more usability issues owing to violations of design guidelines in areas of aesthetics, user errors, controls, and customization. Reviews for speech-language apps were primarily written by SLPs and educators who requested and recommended specific app features (eg, customization of visuals, recorded feedback within the app, and culturally diverse character roles) based on their experiences working with a diverse group of pediatric clients with a variety of communication disorders. Conclusions: To our knowledge, this is the first study to compile and analyze publicly available App Store reviews to identify areas for improvement within mobile apps for pediatric speech-language therapy apps from children with communication disorders and different stakeholders (eg, clinicians, parents, and educators). The findings contribute to the understanding of apps for children with communication disorders regarding content and features, app usability and accessibility issues, and influential factors that impact both AAC apps and speech-language apps for children with communication disorders who need speech therapy.
... Speech Adventure is a game for dyslexic children that is intended to be played for about ten minutes a day. Players are encouraged to give a slug commands for daily tasks by reading them out loud [66]. This is a game predominantly focused on tasks that could be understood as a game environment to the purpose with comparatively fewer 'game' elements. ...
Article
Play presents a popular pastime for all humans, though not all humans play alike. Subsequently, Human–Computer Interaction Games research is increasingly concerned with the development of games that serve neurodivergent ¹ players. In a critical review of 66 publications informed by Disability Studies and Self-Determination Theory, we analyse which populations , research methods, kinds of play and overall purpose goals existing games address. We find that games are largely developed for children, in a top-down approach. They tend to focus on educational and medical settings and are driven by factors extrinsic to neurodivergent interests. Existing work predominantly follows a medical model of disability, which fails to support self-determination of neurodivergent players and marginalises their opportunities for immersion. Our contribution comprises a large-scale investigation into a budding area of research gaining traction with the intent to capture a status quo and identify opportunities for future work attending to differences without articulating them as deficit.
... It focuses on the development of language skills, especially verbal tenses. In [12] authors offer a comparison of different technologies focused on speech therapies, which take into account whether the technology has speech recognition or compatibility with mobile devices. Based on this study, and with the collaboration of experts, they created a speech recognition system and two prototypes that use it: Speech Adventure and Speech with Sam. ...
Article
Full-text available
SATReLO is a tool for the creation of customized applications that support language therapy for children with hearing disabilities. These applications consist of video games that replicate therapeutic activities. Video games can motivate children to embrace therapy positively, increasing the time they dedicate towards this therapy, especially at home. SATReLO allows therapists to customize video games according to the needs of each patient and it keeps a record of his or her progress over time. SATReLO contains a software product line, which makes it possible to derive new video games in real time. The process of testing the system, both in terms of functionality and usability, was thorough and allowed many details of its operation to be fine-tuned. Preliminary tests about the impact of the video games in therapeutic process have been very positive.
Chapter
Speech therapy games present a relevant application of business intelligence to real-world problems. However many such models are only studied in a research environment and lack the discussion on the practical issues related to their deployment. In this article, we depict the main aspects that are critical to the deployment of a real-time sound recognition neural model. We have previously presented a classifier of a serious game for mobile platforms that allows children to practice their isolated sibilants exercises at home to correct sibilant distortions, which was further motivated by the Covid-19 pandemic present at the time this article is posted. Since the current classifier reached an accuracy of over 95%, we conducted a study on the ongoing issues for deploying the game. Such issues include pruning and optimization of the current classifier to ensure near real-time classifications and silence detection to prevent sending silence segment requests to the classifier. To analyze if the classification is done in a tolerable amount of time, several requests were done to the server with pre-defined time intervals and the interval of time between the request and response was recorded. Deploying a program presents new obstacles, from choosing host providers to ensuring everything runs smoothly and on time. This paper proposes a guide to deploying an application containing a neural network classifier to free- and controlled-cost cloud servers to motivate further deployment research.
Conference Paper
Full-text available
Children with speech impairments often find speech curriculums tedious, limiting how often children are motivated to practice. A speech therapy game has the potential to make practice fun, may help facilitate increased time and quality of at-home speech therapy, and lead to improved speech. We explore using conversational real-time speech recognition, game methodologies theorized to improve immersion and flow, and user centered approaches to design an immersive interactive speech therapy solution. Our preliminary user evaluation showed that compared to traditional methods, children were more motivated to practice speech using our system.
Conference Paper
Full-text available
The concept of playification has recently been proposed as an extension of, or alternative to, gamification. We present a playification design project targeting the re-design of physiotherapy rehabilitative sessions for elderly inpatients. The menial and repetitive nature of the physical exercises targeted for design might seem ideal for shallow widespread gamification approaches that add external rewards to entice usage. In the PhySeEar project, we introduced a "third agent" instead, in the form of technology that would take over some of the work typically carried out by the physiotherapist. This technological intervention triggered the emergence of playfulness, when inpatients and the therapist re-signified the ongoing activity by engaging in playful role-taking, such as blaming the technology for mistakes, or for sensitivity to the inpatient's inaccurate movements. Based on the experiences from this project, we discuss some of the major differences between playification and gamification
Article
Objective: The purpose of this study was to explore the design of a health game that aims to both support tobacco-related health literacy and a tobacco-free life in early adolescence and to meet adolescents' expectations. Materials and methods: Data were collected from adolescents using an open-ended questionnaire (n = 83) and focus groups (n = 39) to obtain their view of a health game used for tobacco-related health education. The data were analyzed using thematic analysis. A group of experts combined the adolescents' views with theoretical information on health literacy and designed and produced the first version of the game. Adolescents (session 1, n = 16; session 3, n = 10; and session 4, n = 44) and health promotion professionals (session 2, n = 3) participated in testing the game. Feedback from testing sessions 3 and 4 was analyzed using descriptive statistics. Results: Adolescents pointed out that the health game needs to approach the topic of tobacco delicately and focus on the adolescents' perspective and on the positive sides of a tobacco-free life rather than only on the negative consequences of tobacco. The adolescents expected the game to be of high quality, stimulating, and intellectually challenging and to offer possibilities for individualization. Elements from the adolescents' view and theoretical modelling were embedded into the design of a game called Fume. Feedback on the game was promising, but some points were highlighted for further development. Conclusion: Investing especially in high-quality design features, such as graphics and versatile content, using humoristic or otherwise stimulating elements, and maintaining sufficiently challenging gameplay would promote the acceptability of theory-based health games among adolescents.
Conference Paper
Most children with cleft are required to undertake speech therapy after undergoing surgery to repair their craniofacial defect. However, the untrained ear of a parent can lead to incorrect practice resulting in the development of compensatory structures. Even worse, the boring nature of the cleft speech therapy often causes children to abandon home exercises and therapy altogether. We have developed a simple recognition system capable of detecting impairments on the phoneme level with high accuracy. We embed this into a game environment and provide it to a cleft palate specialist team for pilot testing with children 2 to 5 years of age being evaluated for speech therapy. The system consistently detected cleft speech in high-pressure consonants in 3 out of our 5 sentences. Doctors agreed that this would improve the quality of therapy outside of the office. Children enjoyed the game overall, but grew bored due to the delays of phrase-based speech recognition.
Article
This paper presents a large-scale study of the discriminative abilities of formant frequencies for automatic speaker recognition. Exploiting both the static and dynamic information in formant frequencies, we present linguistically-constrained formant-based i-vector systems providing well calibrated likelihood ratios per comparison of the occurrences of the same isolated linguistic units in two given utterances. As a first result, the reported analysis on the discriminative and calibration properties of the different linguistic units provide useful insights, for instance, to forensic phonetic practitioners. Furthermore, it is shown that the set of units which are more discriminative for every speaker vary from speaker to speaker. Secondly, linguistically-constrained systems are combined at score-level through average and logistic regression speaker-independent fusion rules exploiting the different speaker-distinguishing information spread among the different linguistic units. Testing on the English-only trials of the core condition of the NIST 2006 SRE (24,000 voice comparisons of 5 minutes telephone conversations from 517 speakers -219 male and 298 female-), we report equal error rates of 9.57 and 12.89% for male and female speakers respectively, using only formant frequencies as speaker discriminative information. Additionally, when the formant-based system is fused with a cepstral i-vector system, we obtain relative improvements of ∼6% in EER (from 6.54 to 6.13%) and ∼15% in minDCF (from 0.0327 to 0.0279), compared to the cepstral system alone.
Article
Communication disabilities, including speech, language and voice disabilities, can significantly impact a person's quality of life, employment and health status. Despite this, little is known about the prevalence and etiology of communication disabilities in the general adult population. To assess the prevalence and etiology of communication disabilities in a nationally representative adult sample. We conducted a cross-sectional study and analyzed the responses of non-institutionalized adults to the Sample Adult Core questionnaire within the 2012 National Health Interview Survey. We used respondents' self-report of having a speech, language or voice disability within the past year and receiving a diagnosis for one of these communication disabilities, as well as the etiology of their communication disability. We additionally examined the responses by subgroups, including sex, age, race and ethnicity, and geographical area. In 2012 approximately 10% of the US adult population reported a communication disability, while only 2% of adults reported receiving a diagnosis. The rates of speech, language and voice disabilities and diagnoses varied across gender, race/ethnicity and geographic groups. The most common response for the etiology of a communication disability was "something else." Improved understanding of population prevalence and etiologies of communication disabilities will assist in appropriately directing rehabilitation and medical services; potentially reducing the burden of communication disabilities. Copyright © 2015 Elsevier Inc. All rights reserved.
Article
Speech and prosody-voice profiles for 15 male speakers with High-Functioning Autism (HFA) and 15 male speakers with Asperger syndrome (AS) were compared to one another and to profiles for 53 typically developing male speakers in the same 10- to 50-years age range. Compared to the typically developing speakers, significantly more participants in both the HFA and AS groups had residual articulation distortion errors, uncodable utterances due to discourse constraints, and utterances coded as inappropriate in the domains of phrasing, stress, and resonance. Speakers with AS were significantly more voluble than speakers with HFA, but otherwise there were few statistically significant differences between the two groups of speakers with pervasive developmental disorders. Discussion focuses on perceptual-motor and social sources of differences in the prosody-voice findings for individuals with Pervasive Developmental Disorders as compared with findings for typical speakers, including comment on the grammatical, pragmatic, and affective aspects of prosody.
Article
The articulation errors of 32 spastic and 18 athetoid males, aged 17–55 years, were analyzed using a confusion matrix paradigm. The subjects had a diagnosis of congenital cerebral palsy, and adequate intelligence, hearing, and ability to perform the speech task. Phonetic transcriptions were made of single-word utterances which contained 49 selected phonemes: 22 word-initial consonants, 18 word-final consonants and nine vowels. Errors of substitution, omission and distortion were categorized on confusion matrices such that patterns could be observed. It was found that within-manner errors (place or voicing errors or both) exceeded between-manner errors by a substantial amount, more so on final consonants. The predominant within-manner errors occurred on fricative phonemes for both initial and final positions. Affricate within-manner errors, all of devoicing, were also frequent in final position. The predominant between-manner initial position errors involved liquid-to-glide and affricate-to-stop changes, and for final position, affricate-to-fricative. Phoneme omission occurred three times more frequently on final than on initial consonants. The error data of individual subjects were found to correspond with the identified overall group patterns. Those with markedly reduced speech intelligibility demonstrated the same patterns of error as the overall group. The implications for treatment are discussed.
Article
Experimental comparisons are reported between computer-based and human judgments of speech quality for the same sets of utterances. Speech stimuli were recorded from two normal talkers, who intentionally varied the quality of their speech, and from a hearing-impaired child who was receiving speech therapy on the Indiana Speech Training Aid (ISTRA). The tape recordings were submitted for evaluation to a naive jury, an expert jury, and the ISTRA System, a microcomputer equipped with a speaker-dependent speech recognition board that generated scores representing how well utterance matched a stored template. Correlational analyses of these data indicated that humans were slightly better at judging speech quality than was the computer, but that the computer was much more reliable. These results demonstrate that computer-based speech evaluation may be a reasonable substitute for human judgments for certain types of speech drill.