A speech-controlled environmental control system for people with severe dysarthria

Department of Medical Physics and Clinical Engineering, Barnsley Hospital NHS Foundation Trust, UK.
Medical Engineering & Physics (Impact Factor: 1.84). 06/2007; 29(5):586-93. DOI: 10.1016/j.medengphy.2006.06.009
Source: PubMed

ABSTRACT Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.

  • Source
    • "The second and third datasets are based on data collected in the STARDUST project [3]. The second dataset is an isolated word recognition task using the same (sil $word sil) grammar as the VIVOCA data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past decade, several speech-based electronic assistive technologies (EATs) have been developed that target users with dysarthric speech. These EATs include vocal command & control systems, but also voice-input voice-output communication aids (VIVOCAs). In these systems, the vocal interfaces are based on automatic speech recognition systems (ASR), but this approach requires much training data and detailed annotation. In this work we evaluate an alternative approach, which works by mining utterance-based representations of speech for recurrent acoustic patterns, with the goal of achieving usable recognition accuracies with less speaker-specific training data. Comparisons with a conventional ASR system on dysarthric speech databases show that the proposed approach offers a substantial reduction in the amount of training data needed to achieve the same recognition accuracies. Index Terms: vocal user interface, dysarthric speech, non-negative matrix factorisation
    IEEE Spoken Language Technology Workshop; 12/2014
  • Source
    • "In [25, 26] ASR systems built with Hidden Markov Models (HMMs) [29] achieved significant performances for Dutch and Japanese dysarthric speakers, respectively. In [27] a HMM-based ASR system was able to achieve recognition accuracies over 80% for British speakers with severe dysarthria and a restricted vocabulary (7–10 words) to control electronic devices (e.g., radio, TV). In [30], a hybrid approach that integrated HMMs and ANNs was presented to improve recognition of disordered speech. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dysarthria is a frequently occurring motor speech disorder which can be caused by neurological trauma, cerebral palsy, or degenerative neurological diseases. Because dysarthria affects phonation, articulation, and prosody, spoken communication of dysarthric speakers gets seriously restricted, affecting their quality of life and confidence. Assistive technology has led to the development of speech applications to improve the spoken communication of dysarthric speakers. In this field, this paper presents an approach to improve the accuracy of HMM-based speech recognition systems. Because phonatory dysfunction is a main characteristic of dysarthric speech, the phonemes of a dysarthric speaker are affected at different levels. Thus, the approach consists in finding the most suitable type of HMM topology (Bakis, Ergodic) for each phoneme in the speaker's phonetic repertoire. The topology is further refined with a suitable number of states and Gaussian mixture components for acoustic modelling. This represents a difference when compared with studies where a single topology is assumed for all phonemes. Finding the suitable parameters (topology and mixtures components) is performed with a Genetic Algorithm (GA). Experiments with a well-known dysarthric speech database showed statistically significant improvements of the proposed approach when compared with the single topology approach, even for speakers with severe dysarthria.
    Computational and Mathematical Methods in Medicine 10/2013; 2013:297860. DOI:10.1155/2013/297860 · 1.02 Impact Factor
  • Source
    • "This additional head movement and degraded speech exacerbates the problem of not being able to use close-talking microphones. Previous work with speech-input home control systems had similar practical issues [13] and demonstrated similar performance degradation. Whereas for home control applications some users found the level of performance acceptable, for control of a VOCA, feedback from users has confirmed that such performance is not acceptable and that higher accuracy at larger vocabulary sizes is essential. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment the voice-input voice-output communication aid (VIVOCA) is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
    IEEE transactions on neural systems and rehabilitation engineering: a publication of the IEEE Engineering in Medicine and Biology Society 08/2012; 21(1). DOI:10.1109/TNSRE.2012.2209678 · 2.82 Impact Factor
Show more