Category-Based Statistical Language Models

Source: CiteSeer

ABSTRACT this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. Finally, a technique which permits also the inclusion of long-range word-pair relationships is presented in chapter 5.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we investigate the use of linguistic information given by language models to deal with word recognition errors on handwritten sentences. We focus especially on errors due to out-of-vocabulary (OOV) words. First, word posterior probabilities are computed and used to detect error hypotheses on output sentences. An SVM classifier allows these errors to be categorized according to defined types. Then, a post-processing step is performed using a language model based on Part-of-Speech (POS) tags which is combined to the n-gram model previously used. Thus, error hypotheses can be further recognized and POS tags can be assigned to the OOV words. Experiments on on-line handwritten sentences show that the proposed approach allows a significant reduction of the word error rate.
    Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 01/2009;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly, the scheme of verb subdivision is described. Secondly, a maximum entropy model is presented to distinguish verb subclasses. Finally, a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.
    Journal of Electronics (China) 04/2007; 24(3):347-352.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work several sets of categories obtained by a statisti- cal clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of sys- tem performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the lan- guage model interpolates category based and word based models.
    Progress in Pattern Recognition, Image Analysis and Applications, 10th Iberoamerican Congress on Pattern Recognition, CIARP 2005, Havana, Cuba, November 15-18, 2005, Proceedings; 01/2005

Full-text (2 Sources)

Available from
May 21, 2014

Thomas Niesler