
Andrew Morris- PhD relating to speech recognition in humans and machines.
- Audio-Visual Machine Perception
Andrew Morris
- PhD relating to speech recognition in humans and machines.
- Audio-Visual Machine Perception
About
38
Publications
1,986
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
488
Citations
Current institution
Audio-Visual Machine Perception
Additional affiliations
April 2019 - November 2019
Liopa
Position
- Researcher
Description
- Visual speech recognition.
July 2017 - September 2017
TAINA Technology
Position
- Developer
Description
- Tested various techniques in offline handwriting recognition.
Education
January 1989 - December 1992
January 1984 - December 1988
Publications
Publications (38)
There are many situations in data classification where the data vector to be classified is partially corrupted, or otherwise incomplete. In this case the optimal estimate for each class probability output, for any given set of missing data components, can be obtained by calculating its expected value. However, this means that classifiers whose expe...
In this article we review several successful extensions to the standard hidden-Markov-model/artificial neural network (HMM/ANN) hybrid, which have recently made important contributions to the field of noise robust automatic speech recognition. The first extension to the standard hybrid was the “multi-band hybrid”, in which a separate ANN is trained...
Speaker identification performance in noise is compared with that for clean speech. A multi-layer perceptron (MLP) is used to project standard MFCCs onto an internal representation which enhances speaker discrimination. The MLP-enhanced features thus obtained have previously been shown to increase speaker discrimination in clean speech, are now app...
An MLP classifier outputs a posterior probability for each class. With noisy data, classification becomes less certain, and the entropy of the posteriors distribution tends to increase providing a measure of classification confidence. However, at high noise levels, entropy can give a misleading indication of classification certainty. Very noisy dat...
Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such systems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using the...
An MLP classifier outputs a posterior probability for each class. With noisy data classification becomes less certain and the entropy of the posteriors distribution tends to increase, therefore providing a measure of classification confidence. However, at high noise levels entropy can give a misleading indication of classification certainty because...
Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such sys- tems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using th...
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately reflect true duration distributions. The other is that they impose no hard limit on maximum duration with the result that state transition probabilities...
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately reflect true duration distributions. The other is that they impose no hard limit on maximum duration with the result that state transition probabilities...
An MLP classifier outputs a posterior probability for each class. With noisy data classification becomes less certain and the entropy of the posteriors distribution tends to increase, therefore providing a measure of classification confidence. However, at high noise levels entropy can give a misleading indication of classification certainty because...
Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such systems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using the...
EEG recordings provide an important means of brain-computer communication, but their classification accuracy is limited by unforeseeable variations in the signal due to artefacts or recogniser-subject feedback. A number of techniques were recently developed to address a related problem of recogniser robustness to uncontrollable signal variation whi...
In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilist...
Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of frequencies are dominated by noise has inspired the "missing data" and "multi-band" approaches to noise robust ASR. "Missing data" ASR identifies low SNR spectr...
Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of frequencies are dominated by noise has inspired the "missing data" and "multi-band" approaches to noise robust ASR. "Missing data" ASR identifies low SNR spectr...
Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel distortion. We present here a new approach to data modelling which has the potential to combine complementary existing state-of-the-art techniques for speech enh...
In this paper, we develop different mathematical models in the framework of the multi-stream paradigm for noise robust automatic speech recognition (ASR), and discuss their close relationship with human speech perception. Largely inspired by Fletcher's "product-of-errors'' rule (PoE rule) in psychoacoustics, multi-band ASR aims for robustness to da...
Multi-band speech recognition is powerful in band-limited noise, when the recognizer of the noisy band, which is less reliable, can be given less weight in the recombination process. An accurate decision on which bands can be considered as reliable and which bands are less reliable due to corruption by noise is usually hard to take. We investigate...
In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilist...
If the data vector for input to an automatic classifier is incomplete, the optimal estimate for each class probability must be calculated as the expected value of the classifier output. We identify a form of Radial Basis Function (RBF) classifier whose expected outputs can easily be evaluated in terms of the original function parameters. Two ways a...
. In this paper we apply the Full Combination (FC) multi-band approach, which has originally been introduced in the framework of posterior-based HMM/ANN (Hidden Markov Model/Articial Neural Network) hybrid systems, to systems in which the ANN (or Multilayer Perceptron (MLP)) is itself replaced by a Multi Gaussian HMM (MGM). Both systems represent t...
. The multi-band processing paradigm for noise robust ASR was originally motivated by the observation that human recognition appears to be based on independent processing of separate frequency sub-bands, and also by missing data" results which have shown that ASR can be made signicantly more robust to band-limited noise if noisy sub-bands can be de...
Multi-band ASR was largely inspired by the extremely high level of redundancy in the spectral signal representation which can be inferred from Fletcher's product-oferrors rule for human speech perception. Indeed, the main aim of the multi-band approach is to exploit this redundancy on order to overcome the problem of data mismatch (while making no...
Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the acoustic signal: the articulators. We present here a dynamic Bayesian network (DBN) model that utilizes an additional variable for representing the state of t...
Latent variable decomposition permits factorisation of posterior probability based, or likelihood based, speech unit discriminant functions into a composition of simpler functions which can be analysed separately and evaluated more accurately in the presence of band-limited noise, or other source of data mismatch. See [2,7] for a more self containe...
. This paper addresses the problem of speech recognition in the presence of additive noise. To deal with this problem, it is possible to estimate the noise characteristics using methods which have previously been developed for speech enhancement techniques. Spectral subtraction can then be used to reduce the eect of additive noise on speech in the...
In this paper, we present and investigate a new method for subband-based Automatic Speech Recognition (ASR) which approximates the ideal `full combination' approach which is itself often not practical to realize. The `full combination' approach consists of explicitly considering all possible combinations of subbands (citeHermansky96:TAO) avoiding t...
. In this report, we investigate and compare different subband-based Automatic Speech Recognition (ASR) approaches, including an original approach, referred to as the "full combination approach", based on an estimate of the (noise-) weighted sum of posterior probabilities for all possible subband combinations. We show that the proposed estimate is...
The performance of most ASR systems degrades rapidly with data mismatch relative to the data used in training. Under many realistic noise conditions a significant proportion of the spectral representation of a speech signal, which is highly redundant, remains uncorrupted. In the "missing feature" approach to this problem mismatching data is simply...
In this report, we investigate and compare different subband-based Automatic Speech Recognition (ASR) approaches, including an original approach, referred to as the ``full combination approach'', based on an estimate of the (noise-) weighted sum of posterior probabilities for all possible subband combinations. We show that the proposed estimate is...
In order to achieve real-time performance, the spatio-temporal resolution of preprocessed data entering typical speech recognition systems is limited to a level which is approximately three orders of magnitude less than that required to avoid significant loss of speech information. The problem of reducing high-resolution data to a manageable level...
Summary Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection with a simple Gaussian Mixture Model (GMM) for the data distribution for each speaker, gives above 99% correct recognition. In contrast, a powerful classifier such as a Multi Layer Perceptron (MLP), trained to estimate speaker probabilities, ev...