Conference Paper

Sentence-Level Evaluation Using Co-occurences of N-Grams.

DOI: 10.1007/978-3-540-87536-9_77 Conference: Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I
Source: DBLP


This work presents an evaluation method of Greek sentences with respect to word order errors. The evaluation method is based
on words’ reordering and choosing the version that maximizes the number of trigram hits according to a language model. The
new parameter of the proposed technique concerns the incorporation of unigram probability. This probability corresponds to
the frequency of each unigram to be posed in the first and in the last position of the training set sentences. The comparative
advantage of this method is that it works with a large set of words, and avoids the laborious and costly process of collecting
word order errors for creating error patterns.

7 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, I argue for the use of a probabilistic form of tree-adjoining grammar (TAG) in statistical natural language processing. I first discuss two previous statistical approaches --- one that concentrates on the probabilities of structural operations, ...
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel approach to Machine Translation (MT), called Shake-and-Bake, is presented, which ex- ploits recent advances iLL Computational Linguis- tics in terms of tile increased spread of lexicMist unification-based grammar theories. It is argued that it overcomes some difficulties encountered by transfer and interfingual methods. It offers a greater modularity of the monolingual components, which can be written with indepen- dence of each other, using purely monofinguM considerations. These are put into correspon- dence by means of a bilingual lexicon. The Shake-and-Bake approach for MT consists of parsing the Source Language in any usual way, then looking up the words in the bilinguM lexi- con, and finally generating from tile set of transla- tions of these words, but allowing the Target Lan- guage grammar to instantiate tile relative word ordering, taking advantage of the fact that the parse produces lexical and phrasal signs which are highly constrained (specifically in the semantics). TILe main algorithm presented for generation is a variation on the well-known CKY one used for parsing. A toy bidirectional MT system was written to translate between Spanish and Enghsh, and some of the entries are shown.
    Proceedings of the 14th conference on Computational linguistics - Volume 2; 01/1992
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The sampling problem in training corpus is one of the major sources of errors in corpus-based applications. This paper proposes a corrective training algorithm to best-fit the run-time context domain in the application of bag generation. It shows which objects to be adjusted and how to adjust their probabilities. The resulting techniques are greatly simplified and the experimental results demonstrate the promising effects of the training algorithm from generic domain to specific domain. In general, these techniques can be easily extended to various language models and corpus-based applications.