John S. Bridle’s research while affiliated with Nokia Bell Labs and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (15)


Neural Networks or Hidden Markov Models for Automatic Speech Recognition: Is there a Choice?
  • Chapter

January 1992

·

9 Reads

·

12 Citations

John S. Bridle

Various algorithms based on “neural network” (NN) ideas have been proposed as alternatives to hidden Markov models (HMMs) for automatic speech recognition. We first consider the conceptual differences and relative strengths of NN and HMM approaches, then examine a recurrent computation, motivated by HMMs, that can be regarded as a new kind of neural network specially suitable for dealing with patterns with sequential structure. This “alphanet” exposes interesting relationships between NNs and discriminative training of HMMs, and suggests methods for properly integrating the training of non-linear feed-forward data transformations with the rest of an HMM-style speech recognition system. We conclude that NNs and HMMs are not distinct, so there is no simple choice of one or the other. However, there are many detailed choices to be made, and many experiments to be done.


A speaker verification system using alpha-nets

January 1991

·

16 Reads

·

59 Citations

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

Speaker verification is performed by comparing the output probabilities of two Markov models of the same phonetic unit. One of these Markov models is speaker-specific, being built from utterances from the speaker whose identity is to be verified. The second model is built from utterances from a large population of speakers. The performance of the system is improved by treating the pair of models as a connectionist network, an alpha-net, which then allows discriminative training to be carried out. Experimental results show that adapting the spectral observation probabilities of each state of the model by the back propagation of errors can correct misclassification errors. The real-time implementation of the system produced an average digit error rate of 4.5% and only one misclassification in 600 trials using a five-digit sequence.


An Alphanet approach to optimising input transformations for continuous speech recognition

January 1991

·

25 Reads

·

34 Citations

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

The authors extend to continuous speech recognition (CSR) the Alphanet approach to integrating backprop networks and HMM (hidden Markov model)-based isolated word recognition. They present the theory of a method for discriminative training of components of a CSR system, using training data in the form of complete sentences. The derivatives of the discriminative score with respect to the parameters are expressed in terms of the posterior probabilities of state occupancies (gammas) under two conditions called 'clamped' and 'free' because they correspond to the two conditions in Boltzmann machine training. The authors compute these clamped and free gammas using the forward-backward algorithm twice, and use the differences to drive the adaptation of a preprocessing data transformation, which can be thought of as replacing the linear transformation which yields MFCCs, or which normalizes a grand covariance matrix.


Unsupervised Classifiers, Mutual Information and 'Phantom Targets'

January 1991

·

59 Reads

·

107 Citations

We derive criteria for training adaptive classifier networks to perform unsupervised data analysis. The first criterion turns a simple Gaussian classifier into a simple Gaussian mixture analyser. The second criterion, which is much more generally applicable, is based on mutual information. It simplifies to an intuitively reasonable difference between two entropy functions, one encouraging 'decisiveness,' the other 'fairness' to the alternative interpretations of the input. This 'firm but fair' criterion can be applied to any network that produces probability-type outputs, but it does not necessarily lead to useful behavior.


The ARM continuous speech recognition system

May 1990

·

24 Reads

·

38 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

M.J. Russel

·

K.M. Ponting

·

S.M. Peeling

·

[...]

·

Research on continuous-speech recognition using phoneme-level hidden Markov models (HMMs) is described. The aim of the project is automatic recognition of spoken airborne reconnaissance mission (ARM) reports. The evolution of the ARM system from a simple baseline system to its current configuration is described, and a considerable number of experimental results are included. Work on alternative approaches to modeling contextual effects and on improved duration modeling is described


Simultaneous speaker normalisation and utterance labelling using Bayesian/neural net techniques
  • Conference Paper
  • Full-text available

May 1990

·

25 Reads

·

13 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

A particular form of neural network is described which has terminals for acoustic patterns, class labels, and speaker parameters. A method of training this network to tune in the speaker parameters to a new speaker is outlined. This process can also be viewed from a Bayesian perspective as maximizing the likelihood of the speaker's data by optimizing the model and speaker parameters. A method for doing this when the data are labeled is described. Results of using this technique with whole-word hidden Markov models (HMMs) indicate an improvement over speaker-independent performance and, for unlabeled data, a performance close to that achieved on labeled data

Download

Alpha-nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation

February 1990

·

123 Reads

·

126 Citations

Speech Communication

A hidden Markov model isolated word recogniser using full likelihood scoring for each word model can be treated as a recurrent 'neural' network. The units in the recurrent loop are linear, but the observations enter the loop via a multiplication. Training can use back-propagation of partial derivatives to hill-climb on a measure of discriminability between words. The back-propagation has exactly the same form as the backward pass of the Baum-Welch (EM) algorithm for maximum-likelihood HMM training. The use of a particular error criterion based on relative entropy (equivalent to the so-called Mutual Information criterion which has been used for discriminative training of HMMs) can have derivatives which are interestingly related to the Baum-Welch re-estimates and to Corrective Training.


Fig.1 Feedforward Network Implementing Simple Gaussian Classifier
RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition.

January 1990

·

59 Reads

·

55 Citations


Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

January 1990

·

701 Reads

·

1,214 Citations

We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.



Citations (15)


... Thus, when low level representation of patterns are used, the number of features may be large. In this context, the approach for classifier represented by neural networks are an alternative interesting in place of Bayes' classification because they can provide a direct estimate of the posterior probabilities of the classes and likelihoods [3,10,22]. ...

Reference:

Neural Network Configurations Analysis for Identification of Speech Pattern with Low Order Parameters
Neural Networks or Hidden Markov Models for Automatic Speech Recognition: Is there a Choice?
  • Citing Chapter
  • January 1992

... the current state-of-the-art on almost every benchmark [Chan and Lane, 2015, Han et al., 2017, Tóth, 2015. So called "connectionist" approaches had proposed using neural networks for speech recognition for decades [Bedworth et al., 1989, Huckvale, 1990, Rabiner, 1989, Renals et al., 1994, Waibel et al., 1989, however it is not until the late 00's that they toppled Gaussian Mixtures. ...

Comparison of neural and conventional classifiers on a speech recognition problem
  • Citing Conference Paper
  • November 1989

... There is a rich literature on the applications of globally normalized models [5,23,32,21] as well as detailed studies on the importance of global normalization in addressing the label bias problem [21,2,11]. In the context of ASR there is a lot of research on applying globally normalized models [3,7,6,15,22,40,17]. Among these, MMI [3,7] is the most relevant globally normalized criterion to our work. ...

An Alphanet approach to optimising input transformations for continuous speech recognition
  • Citing Article
  • January 1991

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... Therefore instead of using moving windows and comparing signal powers across the whole original recording, our cropping algorithm simply cuts the recording into pieces according to the detected frequency. iv) Behavioural biometrics for user authentication: Acoustic features from user speech have been used to design systems for speaker recognition [19,57,52,10,17,8]. However, voice can be easily spoofed, especially using replay [39] (27.3% and 70.0% ...

A speaker verification system using alpha-nets
  • Citing Article
  • January 1991

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... Automatic speech recognition. Early ASR systems [15,16,17,18,19] were mainly based on the combinations of hidden Markov models (HMM) with Gaussian mixture models or DNNs, and often contain multiple modules (e.g., an acoustic model, a language model, and a lexicon model) trained separately. Recent works process raw audio sequences end-to-end, including CTC [20]-based models [21,22,23,24,25], recurrent neural network(RNN)-transducers [26,27,28,29], sequenceto-sequence models [30,31,32,33,34,35], and transformer-based models [36,37,38]. ...

Alpha-nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation
  • Citing Article
  • February 1990

Speech Communication

... Following the Law of Conservation of Generalization [4] it is not even clear which method or at least which family of methods is preferred for the given practical problem. To overcome the challenge in successful application of machine learning on practical problems, a plethora of areas in machine learning have emerged -AutoML [5], metalearning [6,7,8,9,10,11,12,13,14], transfer learning (inductive transfer) [15], learning to learn [16], life-long learning, continual learning [17], multi-task learning [18], domain adaptation [19] and domain generalization [20] -with sustainable overlaps among them. Common to these areas is that they utilize a set of problems emerging from various domains (referred to as base-learning problems) or different realization of the same problem, (referred to as varieties/tasks). ...

RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition.

... In this case the output values for any input pattern sum to unity. This case has a special interest since it relates to the idea from pattern recognition that the output of networks operating as classifiers ought to reflect the likelihood of a given pattern belonging to a particular class [4, 5]. Unfortunately, to achieve this sum rule it is generally true that some of the output components have to assume negative values, thus invalidating any interpretation in terms of classical probability theory. ...

Speech Recognition: Statistical and Neural Information Processing Approaches.
  • Citing Conference Paper
  • January 1988

... CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made logits are divided by their corresponding temperature parameters before applying the softmax function 556 (Bridle, 1989), thereby aligning the predicted probabilities with the empirical likelihoods. 557 558 ...

Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters.
  • Citing Conference Paper
  • January 1989