Are you Lehrstuhl Fur Informatik Vi?

Claim your profile

Publications (17)0 Total impact

  • David Vilar, Hermann Ney, Alfons Juan, Enrique Vidal, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds of thousands even for small tasks. This leads to parameter estimation problems for statistical based methods and countermeasures have to be found. One of the most widely used methods consists of reducing the size of the vocabulary according to a well defined criterion in order to be able to reliably estimate the set of parameters. In the field of language modeling this problem is also encountered and several smoothing techniques have been developed. In this paper we show that using the full vocabulary together with a suitable choice of the smoothing technique for the text classification task obtains better results than the standard feature selection techniques.
    06/2004;
  • [Show abstract] [Hide abstract]
    ABSTRACT: this paper measures the local entropy of di#erently sized windows of an image X and extracts those N local features with the highest entropy. The extracted features are then transformed to the same size by scaling
    01/2004;
  • Source
    Oliver Bender, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe a system that applies maximum entropy (ME) models to the task of named entity recognition (NER). Starting with an annotated corpus and a set of features which are easily obtainable for almost any language, we first build a baseline NE recognizer which is then used to extract the named entities and their context information from additional nonannotated data. In turn, these lists are incorporated into the final recognizer to further improve the recognition accuracy.
    12/2003;
  • Sonja Nieen, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In statistical machine translation, correspondences between the words in the source and the target language are learned from bilingual corpora on the basis of so called alignment models.
    07/2003;
  • Source
    Florian Hilger, Hermann Ney, Lehrstuhl Fur Informatik Vi, Olivier Siohan, Frank K. Soong
    [Show abstract] [Hide abstract]
    ABSTRACT: A mismatch between the training data and the test condition of an automatic speech recognition system usually deteriorates the recognition performance. Quantile based histogram equalization can increase the system's robustness by approximating the cumulative density function of the current signal and then reducing an eventual mismatch based on this estimate. In a first step each output of the mel scaled filter bank can be transformed independent from the others. This paper will describe an improved version of the algorithm that combines neighboring filter channels. On several databases recorded in real car environment the recognition error rates could be significantly reduced with this new approach.
    03/2003;
  • Source
    Stephan Kanthak, Hermann Ney, Lehrstuhl Fur Informatik Vi, Michael Riley, Mehryar Mohri
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT&T representing each of the two approaches, were modified to remove variations in model data and acoustic likelihood computation. An experimental comparison showed that the WFST-based system explored fewer search states and had less runtime overhead than the WCTS-based system for a given word error rate. This is attributed to differences in the pre-compilation, degree of non-determinism, and path weight distribution in the respective search graphs.
    02/2003;
  • Source
    Christoph Tillmann, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the use of so-called word trigger pairs to improve an existing language model, which is typically a trigram model in combination with a cache component. A word trigger pair is defined as a long-distance word pair. We present two methods to select the most significant single word trigger pairs. The selected trigger pairs are used in a combined model where the interpolation parameters and trigger interaction parameters are trained by the EM algorithm.
    08/2002;
  • Source
    Florian Hilger, Sirko Molau, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: The noise robustness of automatic speech recognition systems can be increased by transforming the signal to make the cumulative density functions of the signal's values in recognition match the ones that where estimated on the training data. This paper describes a real--time online algorithm to approximate the cumulative density functions, after Mel scaled filtering, using a small number of quantiles. Recognition tests where carried out on the Aurora noisy TI digit strings and SpeechDat--Car databases. The average relative reduction of the word error rates was 32% on the noisy TI digit strings and 29% on SpeechDat--Car.
    08/2002;
  • Source
    Achim Sixtus, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: Today's speech recognition systems use across-word context dependent phoneme models to capture coarticulation across word boundaries. While there are several publications about the organization of across-word model search, there are hardly any descriptions about the training of across-word models.
    08/2002;
  • Source
    Aixplain Ag, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: The performance of machine translation technology after 50 years of development leaves much to be desired. There is a high demand for well performing and cheap MT systems for many language pairs and domains, which automatically adapt to rapidly changing terminology. We argue that for successful MT systems it will be crucial to apply data-driven methods, especially statistical machine translation. In addition, it will be very important to establish common test environments. This includes the availability of large parallel training corpora, well defined test corpora and standardized evaluation criteria. Thereby research results can be compared and this will open the possibility for more competition in MT research.
    09/2001;
  • Source
    Florian Hilger, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach to increase the noise robustness of automatic speech recognition systems by, transforming the signal after Mel scaled filtering, to make the cumulative density functions of the signal's values in recognition match the ones that where estimated on the training data. The cumulative density functions are approximated using a small number of quantiles. Recognition tests on several databases showed significant reductions of the word error rates. On a real life database recorded in driving cars with a large mismatch between the training and testing conditions the relative reductions of the word error rates where over 60%.
    07/2001;
  • Source
    Stephan Kanthak, Achim Sixtus, Sirko Molau, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online environment. We will discuss some incremental methods to reduce the response time of an on-line speech recognizer. We present experimental off-line results for the VERBMOBIL task, a German spontaneous speech corpus, and report on word error rates and real time performance of the search for both within-word and across-word phoneme models. 1. INTRODUCTION The goal of the VERBMOBIL project is to develop a speaker-independent speech-to-speech translation system that performs close to real-time. In this system, speech recognition is followed by subsequent VERBMOBIL modules (like syntactic analysis and translation) which depend on the recognition result. Therefore, in this application it is partic...
    12/2000;
  • Source
    Florian Hilger, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach to normalize the noise level of a speech signal at the outputs of the Mel scaled filter--bank used in MFCC--feature extraction. An adaptive normalizing function that distinguishes between speech and silence parts of the signal is used to normalize the noise level, without altering the speech parts of the signal. This technique is combined with an adaptation of the reference vectors, depending on the average norm of the incoming feature vectors. On a database with training data recorded in office environment and testing data recorded in driving cars, the word error rate could be reduced from 35.5% to 14.7% for the city traffic testing set and from 78.0% to 24.1% for the highway testing set. 1. INTRODUCTION Noise level normalization (NLN) is based on the observance, that a combination spectral subtraction (SS) and signal--to--noise--ratio normalization (SNRN) gives better recognition results when the subtraction and normalization are only applied to the...
    08/2000;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the last few years, the focus in ASR research has shifted from the recognition of clean read speech (i.e. WSJ) to the more challenging task of transcribing found speech like broadcast news (Hub-4 task) and telephone conversations (Switchboard). Available training corpora tend to become larger and more erroneous than before, as transcribing found speech is more difficult. In this paper we present a method to automatically detect faulty training scripts. Based on the Hub-4 task we will report on the efficiency of error detection with the proposed method and investigate the effect of both manually and automatically cleaned training corpora on the word error rate (WER) of the RWTH large vocabulary continuous speech recognition (LVCSR) system. This work is a joint effort of the University of Technology (RWTH) and Philips Research Laboratories Aachen, Germany. 1. INTRODUCTION The importance of automatic transcription verification was highlighted by the 1997 Hub-4 Broadcast News evaluat...
    08/2000;
  • Source
    Franz Josef Och, Christoph Tillmann, Hermann Ney, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe improved alignment models for statistical machine translation. The statistical translation approach uses two types of information: a translation model and a language model. The language model used is a bigram or general m-gram model. The translation model is decomposed into a lexical and an alignment model. We describe two different approaches for statistical translation and present experimental results. The first approach is based on dependencies between single words, the second approach explicitly takes shallow phrase structures into account, using two different alignment levels: a phrase level alignment between phrases and a word level alignment between single words. We present results using the Verbmobil task (German-English, 6000word vocabulary) which is a limited-domain spoken-language task. The experimental tests were performed on both the text transcription and the speech recognizer output. 1 Statistical Machine Translation The goal of machine trans...
    07/2000;
  • Source
    S. Ortmanns, H. Ney, A. Eiden, N. Coenen, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: . This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The search algorithm is based on a tree-organized pronunciation lexicon in connection with a bigram language model. Both look-ahead techniques have been tested on the 20 000-word NAB'94 task (ARPA North American Business Corpus). The recognition experiments show that the combination of bigram language model look-ahead and phoneme look-ahead reduces the size of search space by a factor of about 27 without affecting the word recognition accuracy. 1 Introduction In this paper, we describe two look-ahead techniques for improved beam search, namely language model look-ahead and phoneme look-ahead, for large vocabulary continuous speech recognition. The basic idea of the language model look-ahead is t...
    02/1999;
  • Source
    Ralf Schl Uter, Wolfgang Macherey, Lehrstuhl Fur Informatik Vi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a formally unifying approach for a class of discriminative training criteria including Maximum Mutual Information (MMI) and Minimum Classification Error (MCE) criterion is presented, including the optimization methods gradient descent (GD) and extended Baum-Welch (EB) algorithm. Comparisons are discussed for the MMI and the MCE criterion, including the determination of the sets of word sequence hypotheses for discrimination using word graphs. Experiments have been carried out on the SieTill corpus for telephone line recorded German continuous digit strings. Using several approaches for acoustic modeling, the word error rates obtained by MMI training using single densities always were better than those for Maximum Likelihood (ML) using mixture densities. Finally, results obtained for corrective training (CT), i.e. using only the best recognized word sequence in addition to the spoken word sequence, could not be improved by using the word graph based discriminative training.
    04/1998;