Ralf Schl Uter's research while affiliated with RWTH Aachen University and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (8)
In this work, new acoustic features for continuous speech recognition based on the short-term Fourier phase spectrum are introduced for mono (telephone) recordings. The new phase based features were combined with standard Mel Frequency Cepstral Coefficients (MFCC), and results were produced with and without using additional linear discriminant anal...
We show that vocal tract normalization (VTN) frequency warping results in a linear transformation in the cepstral domain. For the special case of a piece-wise linear warping function, the transformation matrix is analytically calculated. This approach enables us to compute the Jacobian determinant of the transformation matrix, which allows the norm...
In this paper we present a method to derive Mel-frequency cepstral coefficients directly from the power spectrum of a speech signal. We show that omitting the filterbank in signal analysis does not affect the word error rate. The presented approach simplifies the speech recognizer's front end by merging subsequent signal analysis steps into a singl...
In this paper two new scoring schemes for large vocabulary continuous speech recognition are compared. Instead of using the joint probability of a word sequence and a sequence of acoustic observations, we determine the best path through a word graph using posterior word probabilities with or without word context. The exact calculation of the poster...
In the last few years, the focus in ASR research has shifted from the recognition of clean read speech (i.e. WSJ) to the more challenging task of transcribing found speech like broadcast news (Hub-4 task) and telephone conversations (Switchboard). Available training corpora tend to become larger and more erroneous than before, as transcribing found...
Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we propose to estimate the confidence in a hypothesized...
In this paper, the interdependence of language models and discriminative training for large vocabulary speech recognition is investigated. In addition, a constrained recognition approach using word graphs is presented for the efficient determination of alternative word sequences for discriminative training. Experiments have been carried out on the...
In this paper, a formally unifying approach for a class of discriminative training criteria including Maximum Mutual Information (MMI) and Minimum Classification Error (MCE) criterion is presented, including the optimization methods gradient descent (GD) and extended Baum-Welch (EB) algorithm. Comparisons are discussed for the MMI and the MCE crite...
Citations
... There has been more than two decades of research in developing confidence estimates for ASR outputs [1,2]. In conventional ASR systems, the probabilities of decoding lattices or n-best lists are typically used to directly compute confidence scores [3]. Confidence scores are often computed by estimating word posterior probabilities from lattices [4] or word confusion networks [5]. ...
... This is due to the over-fitting caused by the large number of parameters. In some literature, such as[32][33][34], it is known that warping cepstral-based features between different speakers is achieved by linear transformation with an adaptation matrix, and a few diagonal elements (such(a) From source ...
... However, enhancing only the magnitude spectrum has the limitation of reusing the distorted phase for speech reconstruction. Recently, attention has been focused on incorporating spectral phase components to improve the performance of various speech-related tasks, such as improving speech intelligibility [1] and speech recognition [2,3]. Estimating phase information is challenging because there are no explicit ways to model the statistics of phase distortions caused by environmental factors. ...
... The logarithm allows us to use cepstral mean subtraction, which is a channel normalization technique. Finally, Discrete Cosine Transformation (DCT) is applied to the logarithm of the filterbank outputs, which results in the raw MFCC vector [72]. Because our filterbanks are all overlapping, the filterbank energies are quite correlated with each other. ...
... Note that for ease of representation we skip the dimension index d in the following formulae. with iteration constant D. ki (g(x)) and k (g(x)) are discriminative averages of functions g(x) of the training observations, de ned by ki (g(x)) = X n i;i k;n k;kn p (kjx n )] g(x n ) (8) k (g(x)) = X i ki (g(x)) (9) i;j is the Kronecker delta, i.e. given a training observation x n of class k n , i;i k;n = 1 only if i is the 'best-tting' component density i k;n given class k and k;kn = 1 only if k = k n . For fast but reliable convergence of the MMI criterion, the choice of the iteration constant D is crucial. ...
... The previous work [8] in this area has looked at WER between acoustic model predictions with a weak and strong [9,10] language model. The premise was that high agreement between the two predictions suggests the acoustic model is well matched to the target domain. ...