Conference PaperPDF Available

Application of speech technology in a home based assessment kiosk for early detection of Alzheimer's disease

  • ReadSpeaker

Abstract and Figures

Alzheimer's disease, a degenerative disease that affects an es- timated 4.5 million people in the U.S., can be treated far more effectively when it is detected early. There are numerous chal- lenges to early detection. One is objectivity, since caretakers are often emotionally invested in the health of the patients, who may be their family members. Consistency of administration can also be an issue, especially where longitudinal results from different examiners are compared. Finally, the frequency of testing can be adversely affected by scheduling or cost con- straints for in-home psychometrician visits. The kiosk system described in this paper, currently deployed in homes around the country, uses speech technology to provide advantages that ad- dress these challenges. Index Terms: spoken language systems, in-home psychometric testing, Alzheimer's disease.
Content may be subject to copyright.
Application of Speech Technology in a Home Based Assessment Kiosk for
Early Detection of Alzheimer’s Disease
Rachel Coulston, Esther Klabbers, Jacques de Villiers and John-Paul Hosom
Center for Spoken Language Understanding
OGI School of Science & Engineering at OHSU
20000 NW Walker Road, Beaverton, OR 97006, USA
Alzheimer’s disease, a degenerative disease that affects an es-
timated 4.5 million people in the U.S., can be treated far more
effectively when it is detected early. There are numerous chal-
lenges to early detection. One is objectivity, since caretakers
are often emotionally invested in the health of the patients, who
may be their family members. Consistency of administration
can also be an issue, especially where longitudinal results from
different examiners are compared. Finally, the frequency of
testing can be adversely affected by scheduling or cost con-
straints for in-home psychometrician visits. The kiosk system
described in this paper, currently deployed in homes around the
country, uses speech technology to provide advantages that ad-
dress these challenges.
Index Terms: spoken language systems, in-home psychometric
testing, Alzheimer’s disease.
1. Introduction
Alzheimer’s Disease (AD) is a widespread and growing concern
among an increasingly larger aging population. It affects an es-
timated 4.5 million people in the U.S. It increases dramatically
with age, affecting approximately 40-50% of people age 85 and
older. There is no cure, but progression of the disease can be
slowed by therapies that are currently available, with increased
success if intervention is early. Early detection of a gradual,
degenerative disease such as this, however, is difficult. Most in-
dividuals’ declines only begin to be noticeable after devastating
brain loss has already occurred. Challenges central to this prob-
lem are: frequency of monitoring, objectivity of caregivers, and
subtlety of the earliest signs.
Seniors are at highest risk for developing AD, but are also
the most likely to have problems with mobility, disorientation,
and other health complications. In-home testing is therefore
the best option, but regularly sending a psychometrician to the
home of each patient for preventative monitoring can be ex-
tremely costly. Consistency of administration across instances
is another issue, especially given the longitudinal nature of early
detection monitoring of AD. Caregivers have the most consis-
tent contact with patients, but may lack objectivity or adapt to
changes before noticing them and reporting them to physicians.
A computerized in-home monitoring system addresses these is-
sues [1].
To acheive the goals of a longitudinal AD study, subjects
must continue to participate. Drop-out rates are estimated at 40
percent over the course of a typical 4-year study [2]. There are
several reasons the kiosk system is expected to improve upon
this projected subject loss. The physical presence of the kiosk
Figure 1: Kiosk and user during a session.
in the home not only serves as a reminder, but makes study
compliance quite easy for the subject, once he is enrolled. Re-
minders and data collection sessions themselves are initiated by
the kiosk system, and do not rely on the subject to complete
items at his leisure. These facts are expected to improve sub-
ject retention. Even small improvements in retention would be
valuable, especially for studies that aim to track subjects over
very long periods of time.
One of the practical end goals of a kiosk system such as the
one we describe in this paper, is to facilitate the collection of
efficacy data for AD drug treatment trials, since the administra-
tion of these tests as conducted on the computer is standardized
across all participants and visits, unlike in the conventional in-
person style of testing. Analysis is also completed taking full
advantage of objectivity, something which can be difficult for
people scoring in situations where they have contact with sub-
2. The home based assessment study
The kiosk system described in this paper is part of a larger study
in which three methods of testing are being compared: con-
ventional, telephone and computer testing. Conventional test-
ing consists of an in-person administration of the psychometric
battery, paired with mail-in questionnaires. Research study co-
ordinators can call to remind the participants to mail in their
questionnaires. Telephone-based testing is done via an interac-
tive voice response (IVR) system [3]. Participants are provided
with a large-button, hearing-aid compatible telephone. An ap-
pointment is made by the study coordinator for the IVR system
to initiate a phone call to the participant at a specific date and
time, and questionnaires and psychometric tests are adminis-
tered and recorded via telephone. The computerized test using
a kiosk will be further described in this paper.
Currently, a pilot study is underway to compare the three
testing methods. For the pilot study, a total of 45 participants
have been recruited who are randomly assigned to participate in
the conventional, IVR, or kiosk test (15 each). At conference
time, the main study will be in progress, with data being col-
lected in 600 homes, of which 200 homes will be outfitted with
kiosks such as the one we describe here.
3. The kiosk
The kiosk system is designed to be used in the home, with tests
completed monthly. A site visit by the research study coordina-
tor is required to set up the kiosk, schedule appointment times,
and walk the user through his initial training. A short video will
show the participant an example of the test they are to take. The
actual testing occurs unattended in the participant’s own home,
which decreases the number of costly and intrusive site visits
needed to monitor memory function and also aims to increase
the amount and quality of data gathered.
3.1. Physical requirements
The physical footprint of the kiosk is small, so it will not take up
much space in the participant’s home. The participant receives
only a compact PC and a flat panel monitor with a telephone
receiver on one side. There is no dedicated keyboard or mouse
attached to the system. The study coordinator will use both
briefly for setup, and remove them from the premises when she
leaves. The subject interacts with the system solely via speech
through the handset or via touchscreen manipulation. All hard-
ware for the kiosk is commercially available.
Dell OptiPlex 745 Ultra Small Form Factor
Celeron 346/3.06GHz processor
512MB memory
Planar PT1700MU 17-inch USB Touchscreen
Audio Advantage Micro USB Audio Adapter
Handset (requires modification): TeleVoIP PC Handset
Internet connection
All components are used as they arrive off-the-shelf with
one exception. We alter the handset wiring to enable the system
to be aware of the state of the hook switch, which is free when
the user is holding the handset to his ear, and depressed when
the handset is resting in its cradle (or set down on a flat surface,
such as a countertop).
3.2. Software
The client software runs under Windows XP, enabling the kiosk
to be installed on readily commercially available systems. As
the kiosk system computer is not intended to be used for any
Figure 2: The overall architecture of the kiosk data collection
other purpose, such as emailing, web browsing, etc., the soft-
ware runs automatically and continuously and prevents the ma-
chine from being used for other activities. In the event of a
power failure, it will start up automatically upon reboot, and
can be monitored remotely by study personnel. Figure 2 shows
a diagram of the software architecture. Data is cached on the
local file system on the client machine, and uploaded to the
server at regular intervals. Appointments are scheduled either
in person or by phone, and the selected times are entered by the
study coordinators via a web interface. The kiosk contacts the
server every 5 minutes to check for newly scheduled testing ap-
pointments. In the event that the network connection becomes
unavailable, any tests the local machine has been informed of
already will proceed without interruption or data loss. The test
data will be stored locally until the internet service is restored,
at which point it will resume its normal checking intervals, soft-
ware updates, and it will upload the data it has stored. On the
server, data is stored in a MySQL database running on Red Hat
Enterprise Linux.
3.3. Interface
The kiosk system rings and displays a visual reminder of the
appointment time 72 hours prior to a scheduled test. Two days
before the test, a silent message is displayed on the screen for
one hour. Twenty-four hours before the test a (silent) text mes-
sage is displayed for 5 minutes every hour, on the hour between
07:00 and 20:00 local time. The kiosk monitor is turned off
the rest of the time. A session begins with the kiosk ringing at
the scheduled time. Hook switch state and speech detection are
used to determine when the participant has answered, at which
point the user begins to be guided through his session by the
video experimenter. Each instruction to the user is played as
a video clip of the experimenter, so the subject sees and hears
just what would be presented during an in-person visit. The au-
dio is played either via the handset (if lifted) or the USB audio
adapter that is attached to the computer (if handset is resting in
cradle). Wherever the tests allow it, summary instructions are
also shown in text at the top of the screen. This redundancy
eases the burden on hearing- and vision-impaired users, by pro-
viding context in other modalities with which to interpret the
limited amount they can hear or see.
The use of a finger-sensitive touchscreen obviates the need
for a stylus, which can be lost or misplaced (and sometimes
complicated to extract from its storage compartment). Using a
fingertip to point or trace is also a highly natural action, even
for those who have not been exposed to touchscreen technology
Figure 3: One possible state of the interactive touchscreen,
showing the video experimenter.
before. Additionally, speech recognition is enabled for simple
navigation through the interface. For example, the yes/no ques-
tion in Fig. 3 can be answered either by pressing one of the
buttons on the touchscreen or by speaking the word yes or no
aloud into the telephone-style handset.
4. Overview of the test battery
For all three methods of testing, the test includes a diverse bat-
tery of questionnaires and cognitive tests. With the kiosk, an av-
erage test session takes approximately 30 minutes of the partic-
ipant’s time to complete. In this paper, we limit our discussion
to those tasks that currently use speech technology and those for
which speech technology is under development. Questionnaires
and tasks not discussed here include self-report questions, also
delivered by the kiosk, about the quality of life, medication ad-
herence, how well participants are able to complete activities
of daily life and so on. Another nonspeech task in the cogni-
tive battery requires participants to connect labeled dots in the
correct sequence using their finger on the touchscreen. In the
following sections, four tasks using speech technology will be
discussed in more detail.
4.1. Word list recall
For the word list recall task, ten words are presented visually
and the subject reads each one aloud as it appears. At the end
of the presentation, the subject is asked to recall as many of
the words as he can (in any order). The same stimuli are then
presented two more times. The presentation order is varied for
each repeat. After a delay during which separate tasks are com-
pleted, the subject is again asked to recall the words (although
this time they are not presented immediately before the recall is
Speech detection is used to determine whether the subject
is following the instruction to read each word aloud. If there
is silence, the experimenter reminds the subject to read each
word aloud. During recall, speech and silence detection are
used to determine when to prompt subjects to try to think of
more words, and ultimately to ask if they are finished.
Speech recognition is being implemented to automatically
score this closed class (limited set) of words. Since the subject
is expected to say one of ten words, which are known in advance
and do not change, the recognizer can be well trained to iden-
tify these words. Currently, the subjects’ responses are recorded
and uploaded to a server for manual scoring, just as would oc-
cur during an in-person assessment. Collecting these data in the
pilot study means that the manual scores will be available as the
“gold standard” for developing and testing the speech recogni-
tion software.
4.2. Backward digit span
For the backward digit span test, strings of digits are said aloud
by the video experimenter. Subjects are asked to repeat each
string in reverse order. For instance, if the stimulus is “2-8-
3” the correct response would be “3-8-2”. The task begins with
strings two digits in length and continues until the subject makes
an error on two consecutive strings of the same difficulty level
(length) or until the last string is reached, which is eight digits
in length.
Participants’ responses are recognized and scored on-the-
fly. Because digit recognition is very accurate, the flow of this
task is controlled by the speech recognition software [4], [5].
Automatic scoring is used to determine whether to discontinue
the task or to allow it to become more difficult. Audio of the
subject’s responses is also recorded, to be used for manual ver-
ification of automatic recognition accuracy.
4.3. Category fluency
For the category fluency task, subjects are asked to name as
many members of a set as they can think of (i.e., animals) in
one minute. Currently, responses to this task are audio-recorded
and scored manually. The interface uses speech detection in
this task as well, to determine when the experimenter should
intervene: either by prompting the participant to try and think
of some more members of the set or, eventually, to ask if he is
Because this is a fairly limited domain, some scoring (or
at least preprocessing) will be possible. Pilot data will be used
to establish a dictionary with coverage of the animals which
are most commonly named during the task. The subsequent
iteration of the kiosk will be able to produce a transcript and
a preliminary score. The manual scorer will be able to check
the recognizer’s word spotting accuracy and make additions or
corrections if needed.
Data from this task will also be used to investigate potential
features for enhanced detection of impairment. For instance,
the data can be analyzed for clusters of related animals, and
temporal characteristics of the generated lists, such as latency
within clusters versus between clusters. This will be a novel
use of data that are already being collected for the cognitive test
4.4. East boston story recall
For the story recall test, a story is told by the video experi-
menter to the subject, who is subsequently asked to retell it
at two different points during the session. The story contains
12 elements that have to be remembered to obtain a perfect
score. The first retelling occurs immediately following the
presentation and the second retelling happens twenty minutes
later, after several intervening tasks have taken place. The
participant is instructed to use the same words that were used
in the original story to their best ability, and to tell as much of
the story as they can remember.
“Three children were alone at home, and the house caught
on fire. A brave fireman managed to climb in a back window
and carry them to safety. Aside from minor cuts and bruises,
all were well.
Audio is recorded as the participant responds and speech
detection determines when the experimenter should intervene,
prompting the subject to try to recall more, then asking if he
is done. Because the story is fixed, recognition will be used to
spot the words that are supposed to be retold. But since partic-
ipants may use their own words, automated recognition based
scoring will require supervision. In subsequent iterations of the
kiosk system, a manual scorer will be able to check the recog-
nizer’s word spotting results and make additions or corrections
if needed.
In addition, future work is planned, not just to enhance au-
tomatic scoring, but also expand the investigation of speech
markers of AD. This will involve leveraging the recording to ex-
tract more information about a subject’s performance than cur-
rent scoring techniques. An example of this would be weight-
ing the 12 different story elements differently in determining
the overall recall quality, rather than just counting the number
of elements they recalled. Retellings of stories can be further
analyzed for patterns of retelling, such as temporal ordering of
events, or particular story elements that are typically omitted
from the retellings by particular groups.
5. Conclusions
The kiosk is designed to be used as a prevention instrument for
early detection of Alzheimer’s Disease. It can facilitate col-
lection of data for high-risk elderly populations, which can aid
diagnosis of early cognitive decline or onset of dementia. The
combination of speech processing, audio recording for subse-
quent human scoring, and multi-modal redundancy in stimulus
presentation all help to streamline data collection and scoring.
All hardware is commercially available, and minor modifica-
tions are needed to the handset, enabling this software to be
used for large data collections. The invariability in the delivery
of instrutions and stimuli are valuable for keeping stimulus pre-
sentation consistent over the course of longitudinal studies. Ad-
ditionally, unlike human psychometricians, the kiosk will not
be subject to the influence of any prejudice about whether or
not cognitive decline is present (e.g., assumptions about cog-
nitive function that might be made unconsciously based upon
physical appearance of health). Thus, the kiosk system that we
present in this paper meets the consistency and objectivity de-
mands we discussed. It should become a convenient, accurate,
cost-saving system for early diagnosis and longitudinal moni-
toring of Alzheimer’s Disease.
6. Acknowledgements
We are grateful to the funders of the Alzheimer’s Disease Coop-
erative Study (ADCS), the National Institutes of Health’s (NIH)
National Institute on Aging (NIA), as well as Intel’s Behav-
ioral Assessment and Intervention Commons (BAIC) program
for providing additional funding. The views expressed in this
paper do not necessarily represent the views of the NIH or of
Intel. We would also like to acknowledge Tamara Hayes and
Jeff Kaye and their teams for their work on this project. Jessica
Payne-Murphy’s generous contribution of her voice and like-
ness for use as our video experimenter is particularly appreci-
7. References
[1] T.L. Hayes, M. Pavel, J.A. Kaye, (2004),“An unobtru-
sive in-home monitoring system for detection of key mo-
tor changes preceding cognitive decline”, In Proceedings
of the 26th Annual International Conference of the IEEE
EMBS, San Francisco, CA.
[2] J.C. Mundt, & M. King, (2003). “Predicting early treat-
ment drop out using interactive voice response (IVR)” Al-
coholism: Clinical and Experimental Research,27, 28A.
[3] J.C. Mundt, K.L. Ferber, M. Rizzo, J.H. Greist, (2001),
“Computer-automated dementia screening using a touch-
tone telephone” Arch Intern Med., 161:2481-2487.
[4] P. Cosi, J.P. Hosom, J. Schalkwyk, S. Sutton, and
R.A. Cole, (1998), “Connected digit recognition experi-
ments with the OGI Toolkit’s neural network and HMM-
based recognizers”, In Proceedings of IVTTA’98. 4th IEEE
Workshop on Interactive Voice Technology for Telecommu-
nications Applications, Torino, Italy, p 135-140.
[5] J.P. Hosom, (2000), “Automatic Time Alignment of
Phonemes Using Acoustic-Phonetic Information”, Ph.D.
thesis, Oregon Graduate Institute of Science and Technol-
ogy (now OGI School of Science & Engineering at Oregon
Health & Science University).
... Alzheimer's Disease (AD) is a neurodegenerative disease which represents 60 to 70% of the dementia cases in Portugal [1,2,3]. However, its first signs can go unnoticed [1,2,4,5]. Typically, AD is known to cause alterations of memory and of spacial and temporal orientation [3,4,6]. Furthermore, AD increases dramatically with age and it has no cure. ...
... Furthermore, AD increases dramatically with age and it has no cure. Nevertheless, an early diagnosis may slow down its progression by enabling a more effective treatment [5]. For this purpose, several neuropsychological tests exist in the literature, each targeting different cognitive domains and capabilities. ...
... Up to our knowledge, there are few works in the literature that exploit SLT to automate certain types of neuropsychological tests. Some of the most relevant are the kiosk system designed to use at home as a prevention instrument for early detection of AD described in [5], the end-to-end system for automatically scoring the Logic Memory test of the WAIS-III presented in [9], and the system that implements a modified version of the MMSE based on the IBM ViaVoice recognition engine of [10]. These works show the recent increasing interest on this area, but also the long road ahead to support the large variety of existing neuropsychological tests (e.g., some of them are not fully automated). ...
Conference Paper
Full-text available
The diagnosis and monitoring of Alzheimer’s Disease (AD), which is the most common form of dementia, has been the motivation for the development of several screening tests such as Mini-Mental State Examination (MMSE), AD Assessment Scale (ADAS-Cog), and others. This work aims to develop an automatic web-based tool that may help patients and therapists to perform screening tests. The tool was implemented by adapting an existing platform for aphasia treatment, known as Virtual Therapist for Aphasia Treatment (VITHEA). The tool includes the type of speech-related exercises one can find in the most common screening tests, totalling over 180 stimuli, as well as the Animal Naming test. Its great flexibility allows for the creation of different exercises of the same type (repetition, calculation, naming, orientation, evocation, ...). The tool was evaluated with both healthy subjects and others diagnosed with cognitive impairment, using a representative subset of exercises, with satisfactory results.
... However, in their study, postoperative instructions have been considered as an essential part, which has not been considered necessary in the current study. In another study, Coulston et al., stated that, the infrastructure of health information kiosks include computers, monitors, handsets, USB, printer, Internet connection and Window XP software (16). The results of their study are consistent with the findings of current research, because in the current research, the most important infrastructure for health information kiosks include the use of printers alongside health information kiosks, the use of accessories such as headphones and touch screen, the use of video ads in software used in the kiosk, and other items with a score of 5. ...
Full-text available
Introduction Health kiosks are an innovative and cost-effective solution that organizations can easily implement to help educate people. Aim To determine the data requirements and basis for designing health information kiosks as a new technology to maintain the health of society. Methods By reviewing the literature, a list of information requirements was provided in 4 sections (demographic information, general information, diagnostic information and medical history), and questions related to the objectives, data elements, stakeholders, requirements, infrastructures and the applications of health information kiosks were provided. In order to determine the content validity of the designed set, the opinions of 2 physicians and 2 specialists in medical informatics were obtained. The test-retest method was used to measure its reliability. Data were analyzed using SPSS software. Results In the proposed model for Iran, 170 data elements in 6 sections were presented for experts’ opinion, which ultimately, on 106 elements, a collective agreement was reached. Conclusion To provide a model of health information kiosk, creating a standard data set is a critical point. According to a survey conducted on the various literature review studies related to the health information kiosk, the most important components of a health information kiosk include six categories; information needs, data elements, applications, stakeholders, requirements and infrastructure of health information kiosks that need to be considered when designing a health information kiosk.
... These problems are present from the very early stages of the disease, and progress and worsen at the same time that the cognitive decline [276,283]. Due to the ease with which voice recordings can be obtained, in the recent years, speech features have been used in works related to automatic MCI and AD diagnosis systems [27,40,[284][285][286][287][288]. Nonetheless, few studies report speech based classification results that suggest that they have potential to become part of a multimodal diagnosis system. ...
Full-text available
Introduction: The number of Alzheimer's Disease (AD) patients is increasing with increased life expectancy and 115.4 million people are expected to be affected in 2050. Unfortunately, AD is commonly diagnosed too late, when irreversible damages have been caused in the patient. Objective: An automatic, continuous and unobtrusive early AD detection method would be required to improve patients' life quality and avoid big healthcare costs. Thus, the objective of this survey is to review the multimodal signals that could be used in the development of such a system, emphasizing on the accuracy that they have shown up to date for AD detection. Some useful tools and specific issues towards this goal will also have to be reviewed. Methods: An extensive literature review was performed following a specific search strategy, inclusion criteria, data extraction and quality assessment in the Inspec, Compendex and PubMed databases. Results: This work reviews the extensive list of psychological, physiological, behavioural and cognitive measurements that could be used for AD detection. The most promising measurements seem to be magnetic resonance imaging (MRI) for AD vs control (CTL) discrimination with an 98,95% accuracy, while electroencephalogram (EEG) shows the best results for mild cognitive impairment (MCI) vs CTL (97,88%) and MCI vs AD distinction (94,05%). Available physiological and behavioural AD datasets are listed, as well as medical imaging analysis steps and neuroimaging processing toolboxes. Some issues such as "label noise" and multi-site data are discussed. Conclusions: The development of an unobtrusive and transparent AD detection system should be based on a multimodal system in order to take full advantage of all kinds of symptoms, detect even the smallest changes and combine them, so as to detect AD as early as possible. Such a multimodal system might probably be based on physiological monitoring of MRI or EEG, as well as behavioural measurements like the ones proposed along the article. The mentioned AD datasets and image processing toolboxes are available for their use towards this goal. Issues like "label noise" and multi-site neuroimaging incompatibilities may also have to be overcome, but methods for this purpose are already available.
... Existing work includes design patterns, 16 databases of interactions between older people and interactive voice response systems, 17 and voice interfaces for specific applications. 10,18,19 The timing and wording of prompts can be influenced by a number of factors, including the effectiveness of the prompts at facilitating the user to complete the task, the current state of the system, and the likely emotional response of the user. By weighting these factors in different ways, a range of strategies for helping people with dementia can be implemented. ...
Full-text available
Intelligent cognitive assistants support people who need help performing everyday tasks by detecting when problems occur and providing tailored and context-sensitive assistance. Spoken dialogue interfaces allow users to interact with intelligent cognitive assistants while focusing on the task at hand. In order to establish requirements for voice interfaces to intelligent cognitive assistants, we conducted three focus groups with people with dementia, carers, and older people without a diagnosis of dementia. Analysis of the focus group data showed that voice and interaction style should be chosen based on the preferences of the user, not those of the carer. For people with dementia, the intelligent cognitive assistant should act like a patient, encouraging guide, while for older people without dementia, assistance should be to the point and not patronising. The intelligent cognitive assistant should be able to adapt to cognitive decline. © The Author(s) 2015.
Although speech is a highly natural mode of communication, building robust and usable speech-based interfaces is still a challenge, even if the target user group is restricted to younger users. When designing for older users, there are added complications due to cognitive, physiological, and anatomical ageing. Users may also find it difficult to adapt to the interaction style required by the speech interface. In this chapter, we summarise the work on spoken dialogue interfaces that was carried out during the MATCH project. After a brief overview of relevant aspects of ageing and previous work on spoken dialogue interfaces for older people, we summarise our work on managing spoken interactions (dialogue management), understanding older people's speech (speech recognition), and generating spoken messages that older people can understand (speech synthesis). We conclude with suggestions for design guidelines that have emerged from our work and suggest directions for future research.
Full-text available
The existing paradigm of ongoing or posttreatment monitoring of patients through periodic but infrequent office visits has many limitations. Relying on self-report by the patient or their family is equally unreliable. We propose an alternative paradigm in which continuous, unobtrusive monitoring is used to observe changes in physical behavior over time. We highlight the use of this technique for monitoring motor activity that may be predictive of early cognitive changes in the elderly. Initial results using a system of low-cost wireless sensors are presented, together with a discussion of appropriate analyses and interpretation of such data. Using low-cost wireless sensor network coupled with algorithms to detect changes in relevant patterns of behavior, we are able to detect both acute and gradual changes that may indicate a need for medical intervention.
Conference Paper
Full-text available
This paper describes a series of experiments that compare different approaches to training a speaker-independent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the hidden Markov model (HMM) and neural network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development software environment that provides a powerful and flexible tool for creating and using spoken language systems for telephone and PC applications. In particular, the CSLU-HMM, the CSLU-NN, and the CSLU-FBNN development environments, with which our experiments were implemented, are described in detail and recognition results are compared. Our speech corpus is OGI 30K-Numbers, which is a collection of spontaneous ordinal and cardinal numbers, continuous digit strings and isolated digit strings. The utterances were recorded by having a large number of people recite their ZIP code, street address, or other numeric information over the telephone. This corpus represents a very noisy and difficult recognition task. Our best results (98% word recognition, 92% sentence recognition), obtained with the FBNN architecture, suggest the effectiveness of the CSLU Toolkit in building real-life speech recognition systems
Ph.D. Computer Science and Engineering One requirement for researching and building spoken language systems is the availability of speech data that have been labeled and time-aligned at the phonetic level. Although manual phonetic alignment is considered more accurate than automatic methods, it is too time consuming to be commonly used for aligning large corpora. One reason for the greater accuracy of human labeling is that humans are better able to locate distinct events in the speech signal that correspond to specific phonetic characteristics. The development of the proposed method was motivated by the belief that if an automatic alignment method were to use such acoustic-phonetic information, its accuracy would become closer to that of human performance. Our hypothesis is that the integration of acoustic-phonetic information into a state-of-the-art automatic phonetic alignment system will significantly improve its accuracy and robustness. In developing an alignment system that uses acoustic-phonetic information, we use a measure of intensity discrimination in detecting voicing, glottalization, and burst-related impulses. We propose and implement a method of voicing determination that has average accuracy of 97.25% (which is an average 58% reduction in error over a baseline system), a fundamental-frequency extraction method with average absolute error of 3.12 Hz (representing a 45% reduction in error), and a method for detecting burst-related impulses with accuracy of 86.8% on the TIMIT corpus (which is a 45% reduction in error compared to reported results). In addition to these features, we propose a means of using acoustics-dependent transition information in the HMM framework. One aspect of successful implementation of this method is the use of distinctive phonetic features. To evaluate the proposed and baseline phonetic alignment systems, we measure agreement with manual alignments and robustness. On the TIMIT corpus, the proposed method has 92.57% agreement within 20 msec. The average agreement of the proposed method represents a 28% reduction in error over our state-of-the-art baseline system. In measuring robustness, the proposed method has 14% less standard deviation when evaluated on 12 versions of the TIMIT corpus.
This study investigated the sensitivity and specificity of a computer-automated telephone system to evaluate cognitive impairment in elderly callers to identify signs of early dementia. The Clinical Dementia Rating Scale was used to assess 155 subjects aged 56 to 93 years (n = 74, 27, 42, and 12, with a Clinical Dementia Rating Scale score of 0, 0.5, 1, and 2, respectively). These subjects performed a battery of tests administered by an interactive voice response system using standard Touch-Tone telephones. Seventy-four collateral informants also completed an interactive voice response version of the Symptoms of Dementia Screener. Sixteen cognitively impaired subjects were unable to complete the telephone call. Performances on 6 of 8 tasks were significantly influenced by Clinical Dementia Rating Scale status. The mean (SD) call length was 12 minutes 27 seconds (2 minutes 32 seconds). A subsample (n = 116) was analyzed using machine-learning methods, producing a scoring algorithm that combined performances across 4 tasks. Results indicated a potential sensitivity of 82.0% and specificity of 85.5%. The scoring model generalized to a validation subsample (n = 39), producing 85.0% sensitivity and 78.9% specificity. The kappa agreement between predicted and actual group membership was 0.64 (P<.001). Of the 16 subjects unable to complete the call, 11 provided sufficient information to permit us to classify them as impaired. Standard scoring of the interactive voice response-administered Symptoms of Dementia Screener (completed by informants) produced a screening sensitivity of 63.5% and 100% specificity. A lower criterion found a 90.4% sensitivity, without lowering specificity. Computer-automated telephone screening for early dementia using either informant or direct assessment is feasible. Such systems could provide wide-scale, cost-effective screening, education, and referral services to patients and caregivers.
Predicting early treat-ment drop out using interactive voice response (IVR)” Al-coholism
  • J C Mundt
  • M King
J.C. Mundt, & M. King, (2003). “Predicting early treat-ment drop out using interactive voice response (IVR)” Al-coholism: Clinical and Experimental Research,27, 28A.