Conference Paper

Rule Filtering by Pattern for Efficient Hierarchical Translation.

DOI: 10.3115/1609067.1609109 Conference: EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, March 30 - April3, 2009, Athens, Greece
Source: DBLP


We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory us- age through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in transla- tion. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strate- gies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to- English evaluation task. lation. Memory usage can be reduced in cube pruning (Chiang, 2007) through smart memoiza- tion, and spreading neighborhood exploration can be used to reduce search errors. However, search errors can still remain even when implementing simple phrase-based translation. We describe a 'shallow' search through hierarchical rules which greatly speeds translation without any effect on quality. We then describe techniques to analyze and reduce the set of hierarchical rules. We do this based on the structural properties of rules and develop strategies to identify and remove redun- dant or harmful rules. We identify groupings of rules based on non-terminals and their patterns and assess the impact on translation quality and com- putational requirements for each given rule group. We find that with appropriate filtering strategies rule sets can be greatly reduced in size without im- pact on translation performance.

Full-text preview

Available from:
  • Source
    • "The word alignments for Chinese→English translation are trained from around 250M words of parallel text distributed for the GALE P3 evaluation. Hierarchical rules are extracted from the aligned text using the constraints described in Chiang (2007) with the count and pattern filters of Iglesias et al. (2009a). Firstpass translation decoding with HiFST (Iglesias et al. 2009b) generates word lattices encoding large numbers of alternative hypotheses. "
    [Show abstract] [Hide abstract] ABSTRACT: We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.
    Preview · Article · Jun 2013 · Machine Translation
  • Source
    • "Secondly, they also employ pattern-based filtering (Iglesias et al., 2009) in order to reducing redundancies in the Hiero grammar by filtering it based on certain rule patterns. However in our limited experiments , we observed the filtered grammar to perform worse than the full grammar, as also noted by (Zollmann et al., 2008). "
    [Show abstract] [Hide abstract] ABSTRACT: Shallow-n grammars (de Gispert et al., 2010) were introduced to reduce over-generation in the Hiero translation model (Chiang, 2005) resulting in much faster decoding and restricting reordering to a desired level for specific language pairs. However, Shallow-n grammars require parameters which cannot be directly optimized using minimum error-rate tuning by the decoder. This paper introduces some novel improvements to the translation model for Shallow-n grammars. We introduce two rules: a BITG-style reordering glue rule and a simpler monotonic concatenation rule. We use separate features for the new rules in our log-linear model allowing the decoder to directly optimize the feature weights. We show this formulation of Shallow-n hierarchical phrase-based translation is comparable in translation quality to full Hiero-style decoding (without shallow rules) while at the same time being considerably faster.
    Preview · Conference Paper · Jun 2012
  • Source
    • "It is therefore basically antipodal to some of the techniques presented in this paper, which allow for even more flexibility during the search process by extending the grammar with specific non-lexicalized reordering rules. Combinations of both techniques are possible, though, and in fact Iglesias et al. (2009) also investigate a maximum phrase jump of 1 (MJ1) reordering model. In the MJ1 experiment, they include a swap rule, but simultaneously withdraw all hierarchical phrases. "
    [Show abstract] [Hide abstract] ABSTRACT: In this paper, we propose novel exten-sions of hierarchical phrase-based systems with a discriminative lexicalized reorder-ing model. We compare different fea-ture sets for the discriminative reorder-ing model and investigate combinations with three types of non-lexicalized re-ordering rules which are added to the hi-erarchical grammar in order to allow for more reordering flexibility during decod-ing. All extensions are evaluated in stan-dard hierarchical setups as well as in se-tups where the hierarchical recursion depth is restricted. We achieve improvements of up to +1.2 %BLEU on a large-scale Chinese→English translation task.
    Full-text · Article · May 2012
Show more