Publications (7)0 Total impact
-
Article: Divide and translate: improving long distance reordering in statistical machine translation
[show abstract] [hide abstract]
ABSTRACT: This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and recon-structs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be au-tomatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved signifi-cant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of re-search paper abstracts in the medical do-main.08/2010; -
Article: N-best Reranking by Multitask Learning
[show abstract] [hide abstract]
ABSTRACT: We propose a new framework for N-best reranking on sparse feature sets. The idea is to reformulate the reranking problem as a Multitask Learning problem, where each N-best list corresponds to a distinct task. This is motivated by the observation that N-best lists often show significant differ-ences in feature distributions. Training a single reranker directly on this heteroge-nous data can be difficult. Our proposed meta-algorithm solves this challenge by using multitask learning (such as ℓ 1 /ℓ 2 regularization) to discover common feature representations across N-best lists. This meta-algorithm is simple to implement, and its modular approach al-lows one to plug-in different learning algo-rithms from existing literature. As a proof of concept, we show statistically signifi-cant improvements on a machine transla-tion system involving millions of features.08/2010; -
Article: Head finalization: A simple reordering rule for sov languages
[show abstract] [hide abstract]
ABSTRACT: English is a typical SVO (Subject-Verb-Object) language, while Japanese is a typ-ical SOV language. Conventional Statis-tical Machine Translation (SMT) systems work well within each of these language families. However, SMT-based translation from an SVO language to an SOV lan-guage does not work well because their word orders are completely different. Re-cently, a few groups have proposed rule-based preprocessing methods to mitigate this problem (Xu et al., 2009; Hong et al., 2009). These methods rewrite SVO sen-tences to derive more SOV-like sentences by using a set of handcrafted rules. In this paper, we propose an alternative single re-ordering rule: Head Finalization. This is a syntax-based preprocessing approach that offers the advantage of simplicity. We do not have to be concerned about part-of-speech tags or rule weights because the powerful Enju parser allows us to imple-ment the rule at a general level. Our ex-periments show that its result, Head Final English (HFE), follows almost the same order as Japanese. We also show that this rule improves automatic evaluation scores.08/2010; -
Conference Proceeding: Hierarchical Phrase-based Machine Translation with Word-based Reordering Model.
COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China; 01/2010 -
Conference Proceeding: Automatic Evaluation of Translation Quality for Distant Language Pairs.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, 9-11 October 2010, MIT Stata Center, Massachusetts, USA, A meeting of SIGDAT, a Special Interest Group of the ACL; 01/2010 -
Article: Learning of Linear Ordering Problems and its Application to JE Patent Translation in NTCIR-9 PatentMT
[show abstract] [hide abstract]
ABSTRACT: This paper describes the patent translation system submit-ted for the NTCIR-9 PatentMT task. We applied the Lin-ear Ordering Problem (LOP) based reordering model [16] to Japanese-to-English translation to deal with the substantial difference in the word order between the two languages. -
Article: NTT-UT Statistical Machine Translation in NTCIR-9 PatentMT
[show abstract] [hide abstract]
ABSTRACT: This paper describes details of the NTT-UT system in NTCIR-9 PatentMT task. One of its key technology is system com-bination; the final translation hypotheses are chosen from n-bests by different SMT systems in a Minimum Bayes Risk (MBR) manner. Each SMT system includes different tech-nology: syntactic pre-ordering, forest-to-string translation, and using external resources for domain adaptation and tar-get language modeling.
Institutions
-
2010
-
NTT Communication Science Laboratories
Kyoto, Kyoto-fu, Japan
-