Article

Automatic paraphrasing of Japanese functional expressions using a hierarchically organized dictionary

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Automatic paraphrasing is a transformation of expressions into semantically equivalent expressions within one language. For gener-ating a wider variety of phrasal paraphrases in Japanese, it is necessary to paraphrase functional expressions as well as content ex-pressions. We propose a method of para-phrasing of Japanese functional expressions using a dictionary with two hierarchies: a morphological hierarchy and a semantic hi-erarchy. Our system generates appropriate alternative expressions for 79% of source phrases in Japanese in an open test. It also accepts style and readability specifications.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... represents a label of the bottom 199 classes. In [4], the bottom 199 semantic equivalence classes of Japanese functional expressions are designed so that functional expressions within a class are paraphrasable in most contexts of Japanese texts. ...
... [14] proposed the " Sandglass " machine translation architecture in which variant expressions in the source language are first paraphrased into representative expressions, and then, a small number of translation rules are applied to the representative expressions. In this paper, we apply the " Sandglass " architecture to the task of translating Japanese functional expressions into English, where we introduce a recently compiled large scale hierarchical lexicon of Japanese functional expressions [5, 4]. [13] and [8] studied syntactic analysis of functional expressions in sentences. ...
Conference Paper
Full-text available
This paper applied "Sandglass" machine translation archi- tecture to the task of translating Japanese functional ex- pressions into English. We employ the semantic equivalence classes of a recently compiled large scale hierarchical lexi- con of Japanese functional expressions. We examine each class whether it is monosemous or not. We realize this pro- cedure by empirically studying whether functional expres- sions within a class can be translated into a single canonical English expression. Furthermore, in order to precisely iden- tify the class of functional expressions to which our trans- lation rule is directly applicable, we further introduce two types of ambiguities of functional expressions and identify monosemous functional expressions. We finally show that the proposed framework outperforms commercial machine translation software products.
... represents a label of the bottom 199 classes. In Matsuyoshi and Sato (2008), the bottom 199 semantic equivalence classes of Japanese functional expressions are designed so that functional expressions within a class are paraphrasable in most contexts of Japanese texts. ...
... , relative to, with respect to o11 (simultaneous) at the same time, simultaneously with, together with p12 applied to the representative expressions. In this paper, we apply the " Sandglass " architecture to the task of translating Japanese functional expressions into English, where we introduce a recently compiled large scale hierarchical lexicon of Japanese functional expressions ( Matsuyoshi and Sato, 2008). Ambiguities of functional/content usages has been well studied in (Tsuchiya et al., 2005), (Tsuchiya et al., 2006), and (Shudo et al., 2004). ...
Conference Paper
In the "Sandglass" MT architecture, we identify the class of monosemous Japanese functional expressions and utilize it in the task of translating Japanese functional expressions into English. We employ the semantic equivalence classes of a recently compiled large scale hierarchical lexicon of Japanese functional expressions. We then study whether functional expressions within a class can be translated into a single canonical English expression. Based on the results of identifying monosemous semantic equivalence classes, this paper studies how to extract rules for translating functional expressions in Japanese patent documents into English. In this study, we use about 1.8M Japanese-English parallel sentences automatically extracted from Japanese-English patent families, which are distributed through the Patent Translation Task at the NTCIR-7 Workshop. Then, as a toolkit of a phrase-based SMT (Statistical Machine Translation) model, Moses is applied and Japanese-English translation pairs are obtained in the form of a phrase translation table. Finally, we extract translation pairs of Japanese functional expressions from the phrase translation table. Through this study, we found that most of the semantic equivalence classes judged as monosemous based on manual translation into English have only one translation rules even in the patent domain.
... Figure 1 shows examples of the bottom 199 classes, where each of "k11", "D21", "t32", and "t22" represents a label of the bottom 199 classes. In Matsuyoshi and Sato (2008), the bottom 199 semantic equivalence classes of Japanese functional expressions are designed so that functional expressions within a class are paraphrasable in most contexts of Japanese texts. ...
Article
In the "Sandglass" machine translation architecture, we identify the class of monosemous Japanese functional expressions and utilize it in the task of translating Japanese functional expressions into English. We employ the semantic equivalence classes of a re-cently compiled large scale hierarchical lexicon of Japanese functional expressions. We then study whether functional expressions within a class can be translated into a single canonical English expression. Next, we introduce two types of ambiguities of functional expressions and identify monosemous functional expressions. In the evaluation of our translation rules for Japanese functional expressions, we directly apply those rules to monosemous functional expressions, and show that the proposed framework outperforms commercial machine trans-lation software products. We further study how to extract rules for translating functional expressions in Japanese patent documents into English. In the result of this study, we show that translation rules manually developed based on the corpus for Japanese language grammar learners is reliable also in the patent domain.
... represents a label of the bottom 199 classes. In Matsuyoshi and Sato (2008), the bottom 199 semantic equivalence classes of Japanese functional expressions are designed so that functional expressions within a class are paraphrasable in most contexts of Japanese texts. ...
Article
This paper studies issues on machine trans-lation of Japanese functional expressions into English. Unlike our previous works, in order to address the issue of resolving various ambi-guities of a compound expression, this paper takes the approach of example-based machine translation. In this approach, a patent trans-lation example database is developed given the phrase translation tables trained with par-allel patent sentences as well as the training parallel patent sentences themselves. When identifying the most similar translation ex-amples, we integrate semantic equivalence classes of Japanese functional expressions as well as more fine-grained similarity measure of translation examples. In the evaluation, we compare the translation accuracy of the proposed framework with that of Moses, and show that the proposed framework somehow outperforms Moses.
... However, it is not easy for them to learn Japanese, because there are so many differences between grammars of Chinese and those of Japanese, and some Chinese characters have different meanings from those in Japanese. In addition, the Japanese functional expressions take very important roles in Japanese communication (Suguru and Satoshi, 2008), but there are not so many functional expressions in Chinese. Our objective is to evaluate Japanese sentence automatically written by Chinese students. ...
Article
When Chinese students study Japanese, they sometimes find it difficult to understand the grammar of Japanese correctly because Chinese and Japanese use the same Chinese characters in a different way. Our research focuses on the students' exercises of translating Chinese sentences into Japanese, and enables students to acquire correct knowledge based on translated sentences. For this objective, our Chinese-Japanese translation process is divided into three steps: translation of words, ordering of words, and addition of functional expressions to complete the sentence. The system compares translated sentences written by students with the correct answer preset in the system and specifies the step in which students might be mistaken. Then, the system gives a result of evaluating the translated sentence with the explanation of incorrect translation. Since the usage of Japanese functional expressions is the most difficult for the Chinese students translating a sentence, our system supports students learning Japanese functional expressions by considering the factors of the mistaken functional expressions. According to the mistake information, the system selects another question from the question database, which has the same factor as the mistaken functional expression. Experimental results showed that our system was effective in detecting the mistakes, especially mistakes of words and Japanese functional expressions.
... While we described the motivation and current status of our study on compiling phrasal thesaurus particularly focusing on predicate phrases, we are also concerned with paraphrasing of functional expressions. Based on TSUTSUJI: a dictionary of Japanese functional expressions (Matsuyoshi and Sato, 2008), we are exploring the multi-word functional expressions, and the interaction between predicate phrases and functional expressions. ...
Article
Full-text available
The COLING 2010 Workshop MWE 2010 took place on August 28, 2010 in Beijing, China.
Article
The growing need for text mining systems, such as opinion mining, requires a deep semantic understanding of the target language. In order to accomplish this, extracting the semantic information of functional expressions plays a crucial role, because functional expressions such as would like to and can’t are key expressions to detecting customers’ needs and wants. However, in Japanese, functional expressions appear in the form of suffixes, and two different types of functional expressions are merged into one predicate: one influences the factual meaning of the predicate while the other is merely used for discourse purposes. This triggers an increase in surface forms, which hinders information extraction systems. In this article, we present a novel normalization technique that paraphrases complex functional expressions into simplified forms that retain only the crucial meaning of the predicate. We construct paraphrasing rules based on linguistic theories in syntax and semantics. The results of experiments indicate that our system achieves a high accuracy of 79.7%, while it reduces the differences in functional expressions by up to 66.7%. The results also show an improvement in the performance of predicate extraction, providing encouraging evidence of the usability of paraphrasing as a means of normalizing different language expressions.
Conference Paper
When Chinese students study Japanese, they sometimes find it difficult to understand the grammar of Japanese correctly because Chinese and Japanese use the same Chinese characters in a different way. Our research focuses on the students’ exercises of translating Chinese sentences into Japanese, and enables students to acquire correct knowledge based on translated sentences. As for this objective, our Chinese-Japanese translation process is divided into three steps: translation of words, ordering of words, and addition of functional expressions to complete the sentence. The system compares translated sentences written by students with the correct answer preset in the system and specifies the step in which students might be mistaken. Then, the system gives a result of evaluating the translated sentence with the explanation of incorrect translation. Experimental results showed that our system was effective in detecting the mistakes, especially mistakes of words and Japanese functional expressions.
Conference Paper
Full-text available
The Japanese language has a lot of functional expressions, which consist of more than one word and behave like a single functional word. A remarkable characteristic of Japanese functional expressions is that each functional expression has many different surface forms. This paper proposes a methodology for compilation of a dictionary of Japanese functional expressions with hierarchical organization. We use a hierarchy with nine abstraction levels: the root node is a dummy node that governs all entries; a node in the first level is a headword in the dictionary; a leaf node corresponds to a surface form of a functional expression. Two or more lists of functional expressions can be integrated into this hierarchy. This hierarchy also provides a way of systematic generation of all different surface forms. We have compiled the dictionary with 292 headwords and 13,958 surface forms, which covers almost all of major functional expressions.
Article
The Japanese language has various types of compound functional expressions, which are very important for recogniz- ing the syntactic structures of Japanese sentences and for understanding their semantic contents. In this paper, we formalize the task of identifying Japanese compound functional expressions in a text as a chunking problem. We apply a machine learning technique to this task, where we employ that of Support Vector Machines (SVMs). We show that the pro- posed method significantly outperforms existing Japanese text processing tools.
Article
We report that a proper employment of MWEs concerned enables us to put forth a tractable framework, which is based on a multiple nesting of semantic operations, for the processing of non-inferential, Non-propositional Contents (NPCs) of natural Japanese sentences. Our framework is characterized by its broad syntactic and semantic coverage, enabling us to deal with multiply composite modalities and their semantic/pragmatic similarity. Also, the relationship between indirect (Searle, 1975) and direct speech, and equations peculiar to modal logic and its family (Mally, 1926; Prior, 1967) are treated in the similarity paradigm.
Conference Paper
This paper proposes an automatic method of detecting grammar elements that decrease readability in a Japanese sentence. The method consists of two components: (1) the check list of the grammar elements that should be detected; and (2) the detector, which is a search program of the grammar elements from a sentence. By defining a readability level for every grammar element, we can find which part of the sentence is difficult to read.
Conference Paper
This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis we achieve a precision of better than 99% for detecting and tagging short words and 97% for long words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.
Conference Paper
It is important for future NLP systems to formulate the semantic equivalence (and more generally, the semantic similarity) of natural language expressions. In particular, paraphrasing, full text information retrieval, example-based MT and document compression technology require the effective equivalence criterion for linguistic expressions. In this paper, first, we discuss the meaning of Japanese sentence-final modality expressions (ME) and second, present equivalence rules for paraphrasing a string of MEs while preserving its meaning.
The Japan Foundation and Association of International Education, Japan Japanese Language Proficiency Test: Test Content Specifications
The Japan Foundation and Association of International Education, Japan, editors. 2002. Japanese Language Proficiency Test: Test Content Specifications (Revised Edition). Bonjinsha. (in Japanese).
  • Yoshiyuki Morita
  • Masae Matsuki
Yoshiyuki Morita and Masae Matsuki. 1989. Nihongo Hyougen Bunkei, volume 5 of NAFL Sensho (Expression Patterns in Japanese). ALC Press Inc. (in Japanese).
  • Orie Endoh
  • Kenji Kobayashi
  • Akiko Mitsui
Orie Endoh, Kenji Kobayashi, Akiko Mitsui, Shinjiro Muraki, and Yasushi Yoshizawa, editors. 2003. A Dictionary of Synonyms in Japanese (New Edition).
Exploration of clause-structural and function-expressional paraphrasing using KURA
  • Ryu Iida
  • Yasuhiro Tokunaga
  • Kentaro Inui
  • Junji Etoh
Ryu Iida, Yasuhiro Tokunaga, Kentaro Inui, and Junji Etoh. 2001. Exploration of clause-structural and function-expressional paraphrasing using KURA. In Proceedings of the 63rd National Convention of Information Processing Society of Japan, volume 2, pages 5–6. (in Japanese).
Automatic generation of paraphrasing rules from a collection of pairs of equivalent sentences including functional expressions
  • Masatoshi Tsuchiya
  • Satoshi Sato
  • Takehito Utsuro
Masatoshi Tsuchiya, Satoshi Sato, and Takehito Utsuro. 2004. Automatic generation of paraphrasing rules from a collection of pairs of equivalent sentences including functional expressions. In Proceedings of the 10th Annual Meeting of the Association for Natural Language Processing, pages 492–495. (in Japanese).
500 Essential Japanese Expressions: A Guide to Correct Usage of Key Sentence Patterns
  • Etsuko Tomomatsu
  • Jun Miyamoto
  • Masako Wakuri
Etsuko Tomomatsu, Jun Miyamoto, and Masako Wakuri. 1996. 500 Essential Japanese Expressions: A Guide to Correct Usage of Key Sentence Patterns. ALC Press Inc. (in Japanese).