A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment


A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment the voice-input voice-output communication aid (VIVOCA) is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.

Download full-text


Available from: Mark S Hawley,
  • Source
    • "According to a report by [8], more than 70% of dysarthric population with Parkinson's disease or motor neuron disease and around 20% with cerebral palsy or stroke could benefit from some implementation of an augmentative or alternative communication (AAC) device. The benefits of such a setup has proved effective for dysarthric people using speech as an interface for natural communication [9] or enabling them to control physical devices through speech commands [7] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dysarthria is a neurological speech disorder, which exhibits multi-fold disturbances in the speech production system of an individual and can have a detrimental effect on the speech output. In addition to the data sparseness problems, dysarthric speech is characterised by inconsistencies in the acoustic space making it extremely challenging to model. This paper investigates a variety of baseline speaker independent (SI) systems and its suitability for adaptation. The study also explores the usefulness of speaker adaptive training (SAT) for implicitly annihilating inter-speaker variations in a dysarthric corpus. The paper implements a hybrid MLLR-MAP based approach to adapt the SI and SAT systems. ALL the results reported uses UASPEECH dysarthric data. Our best adapted systems gave a significant absolute gain of 11.05% (20.42% relative) over the last published best result in the literature. A statistical analysis performed across various systems and its specific implementation in modelling different dysarthric severity sub-groups, showed that, SAT-adapted systems were more applicable to handle disfluencies of more severe speech and SI systems prepared from typical speech were more apt for modelling speech with low level of severity.
    SLPAT 2015; 09/2015
  • Source
    • "The conventional ASR front-end, referred to as ASR in the experimental results, employs left-to-right HMMs with 7 states per word, which yielded slightly better results than the 9 (non-emitting) states employed in [5]. Lower state counts, down to 3 states per word, were explored as well, but those lead to only very small improvements with few training samples, at the cost of a large performance decrease with more data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past decade, several speech-based electronic assistive technologies (EATs) have been developed that target users with dysarthric speech. These EATs include vocal command & control systems, but also voice-input voice-output communication aids (VIVOCAs). In these systems, the vocal interfaces are based on automatic speech recognition systems (ASR), but this approach requires much training data and detailed annotation. In this work we evaluate an alternative approach, which works by mining utterance-based representations of speech for recurrent acoustic patterns, with the goal of achieving usable recognition accuracies with less speaker-specific training data. Comparisons with a conventional ASR system on dysarthric speech databases show that the proposed approach offers a substantial reduction in the amount of training data needed to achieve the same recognition accuracies. Index Terms: vocal user interface, dysarthric speech, non-negative matrix factorisation
    IEEE Spoken Language Technology Workshop; 12/2014
  • Source
    • "The selection of the suitable topologies and number of Gaussian mixture components for each phoneme in the dysarthric speaker's language was performed with a Genetic Algorithm (GA) which is an important tool used in the field of optimization [46]. The performance of the ASR with the resulting GA-HMMs were compared with the approach of developing a speaker-dependent (SD) system, where training of HMMs is performed with the speech data of the target speaker [17, 27, 32, 35, 39]. The proposed approach achieved statistically significant gains on ASR accuracy when tested with the SD approach on a well-known database of dysarthric speech (Nemours [41]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic) for each phoneme in the speaker's phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components) is performed with a Genetic Algorithm (GA). Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.
    Computational and Mathematical Methods in Medicine 10/2013; 2013(2):297860. DOI:10.1155/2013/297860 · 0.77 Impact Factor
Show more