Efficient Beam Thresholding for Statistical Machine Translation

Human Language Technology Institute for Infocomm Research

ABSTRACT Beam thresholding is a widely-used pruning approach in decoding algorithms of statistical machine translation. In this paper, we pro-pose two variations on the conventional beam thresholding, both of which speed up the de-coding without degrading BLEU score. The first variation is the dynamic beam threshold-ing, in which the beam threshold varies with the length of source sequences covered by hy-potheses. The second one incorporates a lan-guage model look-ahead probability into the beam thresholding so that the interaction be-tween a hypothesis and the contexts outside the hypothesis can be captured. Both thresh-olding methods achieve significant speed im-provements when used separately. By com-bining them together, we obtain a further speedup, which is comparable to that of the cube pruning approach (Chiang, 2007). Ex-periments also display that the dynamic beam thresholding can further improve the cube pruning.

Download full-text


Available from: Deyi Xiong, Mar 14, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: One of the most promising and leading machine translation strategies would be Statistical Translation Approach. Being pertinent even to structurally dissimilar language pairs, it has confirmed its suitability for large text translation. Rising demand is present for automatic translation between Sinhala and Tamil for quite a lot of decades. Statistical approach is the best preference to resolve the unavailability of a machine translation tool for the languages concerned. Because of language similarity, statistical approach could thrive agreeably, exclusive of more concern on linguistic knowledge. A basic translation system has been modelled and implemented in this research, with the preparation of parallel corpora from parliament order papers. This paper demonstrates only the preliminary system runs of the research, devoid of various parameter refinements and actual design and evaluation strategies. Language Model, Translation Model and Decoder Configurations are done consistent with recent literature. To facilitate the improvement of output quality, MERT technique is integrated to tune the decoder. To stay away from sole dependence on BLEU, two other automatic metrics namely TER and NIST are utilised for the evaluation in different aspects. In addition, directions to future research are also recognized and specified for the refinements of this system.