Article

Preferential presentation of Japanese near-synonyms using definition statements

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper proposes a new method of ranking near-synonyms ordered by their suitability of nuances in a particular context. Our method distincts near-synonyms by semantic features extracted from their definition statements in an ordinary dictionary, and ranks them by the types of features and a particular context. Our method is an initial step to achieve a semantic paraphrase system for authoring support.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... (16) Paraphrasing of common nouns to their synonyms Yamamoto, 2002b;Okamoto et al., 2003) s. kyuryo-ni kinenkan-ga kansei-shita. ...
... Edmonds (1999) have proposed an ontology for representing the sameness and idiosyncrasy between near-synonyms and thereby improved lexical choice in machine translation. In terms of knowledge acquisition, recent advances, such as (Okamoto et al., 2003;Inkpen, 2003), have shown that utilizing existing synonym dictionaries such as WordNet (Miller et al., 1990) and Kadokawa Synonym New Dictionary (Ôno and Hamanishi, 1981) is a feasible way of acquiring semantic differences between near-synonyms, although several issues still remain. ...
Article
Full-text available
http://library.naist.jp/mylimedio/dllimedio/show.cgi?bookid=83958 博士 (Doctor) 工学 (Engineering) 博第459号 甲第459号
... Keywords : parapharasing, text revision, language model, error detection, verb valency, machine learning 1 [16] [1] (MT; Machine Translation) [20] (QA; Question Answering) [14] [19] (IR; Information Retrieval) [12] [3] [10] QA IR [17] ...
Article
Full-text available
This paper argues the issue of transfer errors in paraphrasing. Our previous investigation into transfer errors revealed that verb valency errors occur frequently, irrespective of the types of transfer. Moti- vated by this finding, we propose an empirical method to detect incorrect verb valences occurring in paraphrasing Japanese sentences. Our error detection model involves ensembling of two error detec- tion models that are separately trained on a large collection of unlabeled positive examples and a small collection of labeled negative examples. An experiment showed that our ensemble method achieved 79.4% 11-point average precision, a 13.3 point improvement over the model trained only on positive examples. We also propose a selective sampling scheme to reduce the cost of labeling examples.
... These works use a certain dictionary for word segmentation, word selection, and/or addition of part-of-speech information. Other resources, such as sets of definition statements extracted from dictionaries , are also used for synonym extraction (Fujita & Inui, 2001; Blondel & Sennelart, 2002; Okamoto et al., 2003). The synonyms extracted by these methods depend on the definitions of the words. ...
Article
Full-text available
Although related pairs of words are useful lexical semantic resources, it is sometimes expensive to create and maintain the pairs. We propose a method that extracts pairs of related Japanese words from a text corpus, without the use of language knowledge, such as a dictionary, in any of the steps. This is difficult with a Japanese text because there are no spaces between words. The pairs are related words with similar usages and can be useful for understanding texts including unknown words. These related word pairs are extracted based on judgments of whether two words are used in a similar way. We report the precisions of pair lists extracted from various kinds of corpora and analyze the tendencies of each list.
Conference Paper
Electronic technical documents available on the Internet are a powerful source for automatic extraction of term translations and synonyms. This paper presents an association-based approach to extract possible translations and synonyms by iterative candidate generation using a search engine. The plausible candidate pairs can be chosen by calculating their co-occurring statistics. In our experiment to extract Thai-English medical term pairs, four possible alternative associations; namely confidence, support, lift and conviction, are investigated and their performances are compared by ten-fold cross validation. The experimental results show that lift achieves the best performance with 73.1% f-measure with 67% precision and 84.2% recall on translation pair extraction, 68.7% f-measure with 71.5% precision and 67.7% recall on Thai synonym term extraction and 72.8% f-measure with 72.0% precision and 75.1% recall on English synonym term extraction. The precision of our approach in Thai-English translation, Thai synonym and English synonym extraction are 4 times, 3.5 times and 5.5 times higher than baseline precision respectively.
Article
Full-text available
In this paper, we investigate several schemes for selecting features whichare useful for automatically classifying questions by their question type. We represent questions as a set of features, and compare the performance of the C5.0 machine learning algorithm using the different representations. Experimental results show a high accuracy rate in categorizing question types using a scheme based on NLP techniques as compared to a scheme based on IR techniques. The ultimate goal of this research is to use question type classification in order to help identify whether or not two questions are paraphrases of each other. We hypothesize that the identification of features which help identify question type will be useful in the generation of question paraphrases as well.
Article
Full-text available
concrete: The ferror j blunderg cost him dearly.
Article
Existing paraphrase systems either create successive drafts from an underlying representation, or they aim to correct texts. A different kind of paraphrase, also reflecting a real-world task, involves imposing, on an original text, external constraints in terms of length, readability, etc. This sort of paraphrase requires a new framework which allows production of a new text satisfying the constraints, but with minimal change from the original. In order for such a system to be feasible when used on a large scale, searching the space of candidate solution texts has to be made tractable. This paper examines a method for pruning the search space via branch-and-bound, evaluates three variants to find the most efficient model, and discusses its relation to standard heuristic methods such as genetic algorithms. 1 Introduction The paraphrasing framework of this paper is one where a text is modified `reluctantly' to conform to external surface constraints, such as length or readability requi...
Is a Long Sen-tence Always Incomprehensible?: A Structural Anal-ysis of Readability Factors. ing Society of Japan SIGNotes Natural Language
  • Intelligent
  • Assis-Hiroko
  • Naoyuki Inui
  • Okada
Intelligent Writing Assis-Hiroko Inui and Naoyuki Okada. 2000. Is a Long Sen-tence Always Incomprehensible?: A Structural Anal-ysis of Readability Factors. ing Society of Japan SIGNotes Natural Language, 135(9):63–70. (In Japanese) Kentaro Inui and Satomi Yamamoto. Based Acquisition of Sentence Readability Ranking Models for Deaf People. Natural Language Processing Pacific Rim Symposium (NLPRS), 159–166, Tokyo
A textual Analysis Method to Extract Negative Expressions in writing Tools for Japanese Documents
  • Akira Suganuma
  • Masanori Kurata
  • Kazuo Ushijima
Is a Long Sentence Always Incomprehensible?: A Structural Analysis of Readability Factors
  • Hiroko Inui
  • Naoyuki Okada
Automatic Rewriting Method for Internal Expressions in Japanese to English MT and Its Effects
  • Satoshi Shirai
  • Satoru Ikehara
  • Akio Yokoo
  • Yoshifumi Ooyama