Article

Learning Translation Rules for a Bidirectional English-Filipino Machine Translator

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Filipino is a changing language that poses several challenges. Our goal is to develop a bidirectional English-Filipino Machine Translation (MT) system using a hybrid approach to learn rules from examples. The first phase was an English to Filipino MT system that required several language resources. The problem lies on its dependency over the annotated grammar which is currently unavailable for Filipino, which makes reverse translation impossible. Phase 2 addresses this limitation by using information taken from English and Filipino POS Taggers. The seed rules are generated by aligning the POS tags from English and Filipino examples, including their constraints. To perform compositionality, the system deduces the constituent labels by using the longest adjacent POS tags found in both the English and Filipino rule. The system groups together similar rules and generalizes it to encompass a wider range of unseen examples.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In building a translator for Filipino-English and vice-versa, different approaches can be applied for it to be accomplished. Approaches include: LFG-based [1], Template-based [5] , Rule- based [9], [10], Statistical Machine Translation or SMT [3], [8], and many more. Each of the approaches' highest scoring performances based on BLEU [6] scores is displayed on TABLE I. ...
... Additionally, not only approaches differ in creating translators. Resources used can also vary, instances include choice of data such as movie subtitles and articles [4], which can be in different languages such as Indonesian, Vietnamese [7], European [3], English and Filipino [1], [5], [10], and more. 7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines Several experiments were done to acquire the best Moses setting for the corpus to have a high BLEU score while retaining quality of the translations. ...
Conference Paper
Communication between different nations is essential. Languages which are foreign to another impose difficulty in understanding. For this problem to be resolved, options are limited to learning the language, having a dictionary as a guide, or making use of a translator. This paper discusses the development of ASEANMT-Phil, a phrase-based statistical machine translator, to be utilized as a tool beneficial for assisting ASEAN countries. The data used for training and testing came from Wikipedia articles comprising of 124,979 and 1,000 sentence pairs, respectively. ASEANMT-Phil was experimented on different settings producing the BLEU score of 32.71 for FilipinoEnglish and 31.15 for English-Filipino. Future Directions for the translator includes the following: improvement of data through changing or adding the domain or size; implementing an additional approach; and utilizing a larger dictionary to the approach.
... All of which present different output formalisms to represent the grammar. However, Filipino was currently observed to be computationally resource-limited and does not have the computational resources, i.e, bracketed corpora and treebanks, necessary for the algorithms presented [16]. There are existing corpora available for the language, but these have not yet been bracketed. ...
Article
Full-text available
This paper discusses the Greedy Merge Model used for an unsupervised grammar induction system for the Filipino language. The approach attempts to address the current state of Philippine linguistic resources, specifically the formal grammars, which are insubstantial for robust analysis. The Greedy Merge Model results show an F1 measure of 69%. Generated grammar rules are presented, and current limitations of the results are discussed.
Article
Full-text available
Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused.
Article
We describe METEOR, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machine- produced translation and human-produced reference translations. Unigrams can be matched based on their surface forms, stemmed forms, and meanings; further- more, METEOR can be easily extended to include more advanced matching strate- gies. Once all generalized unigram matches between the two strings have been found, METEOR computes a score for this matching using a combination of unigram-precision, unigram-recall, and a measure of fragmentation that is designed to directly capture how well-ordered the matched words in the machine translation are in relation to the reference. We evaluate METEOR by measuring the cor- relation between the metric scores and human judgments of translation quality. We compute the Pearson R correlation value between its scores and human qual- ity assessments of the LDC TIDES 2003 Arabic-to-English and Chinese-to-English datasets. We perform segment-by- segment correlation, and show that METEOR gets an R correlation value of 0.347 on the Arabic data and 0.331 on the Chinese data. This is shown to be an im- provement on using simply unigram- precision, unigram-recall and their har- monic F1 combination. We also perform experiments to show the relative contribu- tions of the various mapping modules.
Article
N-gram measures of translation quality, such as BLEU and the related NIST metric, are becoming increasingly important in machine translation, yet their behaviors are not fully understood. In this paper we examine the performance of these metrics on professional human translations into German of two literary genres, the Bible and Tom Sawyer.
Article
Data-driven approaches to machine translation often rely heavily on large training corpora. We are developing a translation system targeted specifically at minority languages for which such large corpora are not usually available.
Translation With Rule-Learning
  • R J Ang
  • N G Bautista
  • Y R Cai
  • B G Tanlo
Ang, R. J., Bautista, N. G., Cai, Y. R., & Tanlo, B. G.: Translation With Rule-Learning. Philippines: Undergraduate Thesis, De La Salle University Manila (2005)
IsaWika: A Machine Translation from English to Filipino, A Prototype
  • A Borra
  • R Roxas
Borra, A., & Roxas, R.: IsaWika: A Machine Translation from English to Filipino, A Prototype. University of the Philippines (1997)
Evaluation of Models
  • Lis-Rudjer Boskovic Institute
LIS -Rudjer Boskovic Institute: Evaluation of Models. Retrieved July 2006, from http://dms.irb.hr/tutorial/tut_mod_eval_4.php (2002)
Automatic Learning of Syntactic Transfer Rules for Machine Translation
  • K Probst
Probst, K.: Automatic Learning of Syntactic Transfer Rules for Machine Translation. Retrieved December 18, 2005, from http://www.cgi.sc.cmu.edu/ People/kathrin/Research/ SummaryOfProposal.pdf (2003)
AEFLEX: Automatic English Filipino Lexicon Extractor
  • J Lat
  • S Ng
  • K Sze
  • G Yu
Lat, J., Ng, S., Sze, K., & Yu, G.: AEFLEX: Automatic English Filipino Lexicon Extractor. Undergraduate Thesis. De La Salle University-Manila (2006)