About
154
Publications
32,642
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
565
Citations
Introduction
1991-1996 Toyohashi Univ. of Tech. (as a student)
1996 Dr.Eng
1996-2002 ATR research labs.
2002-current Nagaoka Univ. of Tech.
2016-current board member, Association for NLP, Japan.
Research interest: paraphrasing, simplification, and lexical resource construction.
Additional affiliations
October 2002 - present
October 2002 - present
Publications
Publications (154)
The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese p...
This paper examines the introduction of "Easy Japanese" by extracting important segments for translation. The need for Japanese language has increased dramatically due to the recent influx of non-Japanese-speaking foreigners. Therefore, in order for non-native speakers of Japanese to successfully adapt to society, the so-called Easy Japanese is bei...
The automatic insertion of diacritics in electronic texts is necessary for a number of languages, including French, Romanian, Croatian, Sindhi, Vietnamese, etc. When diacritics are removed from a word and the resulting string of characters is not a word, it is easy to recover the diacritics. However, sometimes the resulting string is also a word, p...
In machine translation (MT), modality errors are often critical. We propose a phrase-based statistical MT method that preserves the modality of input sentences. The method introduces a feature function that counts the number of phrases in a sentence that are characteristic words for modalities. This simple method increases the number of translation...
We have built a Japanese large-scale general ontology restructured from Wikipedia, that represents a is-a relation hierarchy. A Wikipedia’s article page belongs to one or more categories that are organized hierarchically by linking to others. However, there are the following two issues to be solved in order to use the categories and the articles as...
We have been proposed Syntactic Piece, an unit for shallow language processing. A piece consists of a pair of modifier and modificand, derived from syntactic structure, and two expressions which differ slightly but having the same meaning are represented as the same piece in this framework. In this paper we report (a) reconsideration of creation pr...
Finding pages on the Web that are similar to a query page is an important component of modern search engines. Especially recognition method of content about Web pages is important role in search engine. However, if Web page include query words, it does not necessarily mean that Web page describe query. The main challenge here is identification fact...
Automatic bilingual term extraction is essen-tial for providing a consistent bilingual term list for human translators engaged in translat-ing a set of documents. We compare three sta-tistical measures for extracting bilingual terms from a phrase-table built from a parallel cor-pus. We show that these measures extract dif-ferent bilingual term cand...
In this study, we extracted articles describing problems, articles describing their solutions, and articles describing their causes from a Japanese Q&A styleWeb forum using a supervised machine learning with 0.70, 0.86, and 0.56 F values, respectively. We confirmed that these values are significantly better than their baselines. This extraction wil...
Automatic bilingual term extraction is essen-tial for providing a consistent bilingual term list for human translators engaged in translat-ing a set of documents. We compare three sta-tistical measures for extracting bilingual terms from a phrase-table built from a parallel cor-pus. We show that these measures extract dif-ferent bilingual term cand...
We propose a method to detect Japanese nasty comments from posts on bulletin board systems (BBS). Nasty comments can cause many social problem, because they express potentially harmful words and phrases. There are methods to recognize harmful words, but they are insufficient. Therefore, we present a method for detecting such comments on a BBS with...
This paper presents a new computation of lexical distributional similarity, which is a corpus-based method for computing similarity of any two words. Although the conventional method focuses on em-phasizing features with which a given word is associated, we propose that even unassociated features of two input words can further improve the performan...
We present a novel method for build-ing a large-scale Japanese ontology from Wikipedia using one of the largest Japanese thesauri, Nihongo Goi-Taikei (referred to hereafter as "Goi-Taikei") as an upper ontology. First, The leaf cat-egories in the Goi-Taikei hierarchy are semi-automatically aligned with seman-tically equivalent Wikipedia categories....
This paper presents a method for con-structing a large-scale Person Ontology with category hierarchy from Wikipe-dia. We first extract Wikipedia category labels which represent person (hereafter, Wikipedia Person Category, WPC) by using a machine learning classifier. We then construct a WPC hierarchy by de-tecting is-a relations in the Wikipedia ca...
In this study, our purpose was to make a short summary for sentences. For ex-ample, we aimed to make a short summary "terror" for sentences "A bomb went off. Some people were killed. This was triggered by rebel campaign." In this study, we proposed a new method that generates summaries that can appropriately and adequately express the contents of t...
To address the shortage of Japanese-English parallel corpora, we developed a parallel cor-pus by collecting open source software man-uals from the Web. The constructed cor-pus contains approximately 500 thousand sen-tence pairs that were aligned automatically by an existing method. We also conducted statis-tical machine translation (SMT) experiment...
It is expensive for companies to browse daily reports. Our aim is to create a system that extracts information about problems from reports. This system operates in two steps. First, it records expressions involving troubles in a dictionary from training data. Second, it expands the dictionary to include information not included in the training data...
This paper presents a method for generating reviews of stories. In this work, we focus on generating sentences that include subjective expressions. First, we constructed lexicons using emotion-emerged expressions that are thought to be the origin of the emotional content. The lexicon consists of syntactic pieces that are proposed as units of syntac...
Automatic summarization is an important task as a form of human support technology. We propose in this paper a new summariza-tion method that is based on example-based approach. Using example-based approach for the summarization task has the following three advantages: high modularity, absence of the necessity to score importance for each word, and...
ある入力文書が多くの人にとってどの程度興味や関心を持つかを算出する指標を提案する. 各個人の興味や関心は多種多様であり, これを把握することで情報のフイルタリング等を行う研究は知られているが, 本研究では不特定多数すなわち大衆が全体でどの程度の興味を持つかについて検討を行った. このような技術は, 不特定多数に対して閲覧されることを想定しているWebサイトにおける提示文書の選択や表示頂の変更など, 非常に重要な応用分野を持っている. 我々は大衆の興味が反映されている情報源として順位付き文書を使用した. 本手法ではこれを学習データとして利用して, 文書に含まれる語句及び文書自体に興味の強弱を値として付与する手法を構築した. 興味を値として扱うことで, 興味の強弱を興味がある・ないの2値ではなく...
This paper addresses a task of opinion extraction from given documents and its positive/negative classification. We propose a sentence classification method using a notion of syntactic piece. Syntactic piece is a minimum unit of structure, and is used as an alternative processing unit of n-gram and whole tree structure. We compute its semantic orie...
This paper describes a system which identifies discourse relations between two successive sentences in Japanese. On top of the lexical information previously proposed, we used phrasal pattern information. Adding phrasal information improves the system's accuracy 12%, from 53% to 65%.
In this paper, we present a novel global re- ordering model that can be incorporated into standard phrase-based statistical ma- chine translation. Unlike previous local reordering models that emphasize the re- ordering of adjacent phrase pairs (Till- mann and Zhang, 2005), our model ex- plicitly models the reordering of long dis- tances by directly...
One of the key issues in spoken-language translation is how to deal with unrestricted expressions in spontaneous utterances. We have developed a paraphraser for use as part of a translation system, and in this paper we describe the implementation of a Chinese paraphraser for a Chinese-Japanese spoken-language translation system. When an input sente...
山本 和英,池田 諭史, 大橋 一輝. 「新幹線要約」のための文末の整形. 自然言語処理, Vol.12, No.6, pp.85-111 , 言語処理学会 (2005.11)
白 京姫,大竹 清敬,BOND FRANCIS, 山本 和英. 原言語が異なる翻訳コーパスの定量的分析. 自然言語処理, 特集号「コーパス言語学・言語教育と言語処理」, Vol.12, No.4, pp.117-136 , 言語処理学会 (2005.8)
山本 和英, 大橋 一輝. 「サ変動詞+名詞」の複合名詞への換言. 自然言語処理, Vol.12, No.3, pp.19-42 , 言語処理学会 (2005.7)
峠 泰成, 大橋 一輝, 山本 和英. ドメイン特徴語の自動取得によるWeb掲示板からの意見文抽出. 言語処理学会第11回年次大会, pp.672-675 (2005.3)
土田 雅之, 大橋 一輝, 山本 和英. 時制と態を考慮したサ変名詞の動詞化. 言語処理学会第11回年次大会, pp.209-212 (2005.3)
大橋 一輝, 山本 和英, 齋藤 邦子, 永田 昌明. 句に基づく統計翻訳における語句の並べ替えパターンの分析. 言語処理学会第11回年次大会, pp.863-866 (2005.3)
News on electrical bulletin boards con-sist of high density expressions. Many sentences end with unique expressions that consist of nouns and case parti-cles. This paper focuses on expressions used at the end of sentences and at-tempts to summarize them by forming noun or case particle endings. We sum-marize the news sentence through pat-tern match...
In this paper, we present a novel distortion model for phrase-based statistical machine translation. Unlike the pre-vious phrase distortion models whose role is to simply penal-ize nonmonotonic alignments[1, 2], the new model assigns the probability of relative position between two source lan-guage phrases aligned to the two adjacent target languag...
This paper describes a system which solves language tests for second grade students (7 years old). In Japan, there are materials for students to measure understanding of what they studied, just like SAT for high school students in US. We use textbooks for the stu-dents as the target material of this study. Questions in the materials are classified...
山本 和英, 安達 康昭. 国会会議録を対象とする話し言葉要約. 自然言語処理, Vol.12, No.1, pp.51-78 , 言語処理学会 (2005.1)
峠 泰成, 大橋 一輝, 山本 和英. 繰り返し学習を用いた話題に順応する意見文抽出. 情報処理学会 研究報告, FI77-5 (2004.11)
池田 諭史, 大橋 一輝, 山本 和英. 「新幹線要約」のための文末の整形. 情報処理学会 研究報告, NL163-22 / FI76-22 (2004.9)
沢井 康孝, 峠 泰成, 山本 和英. 順位付け文書からの影響因子マイニング. 情報処理学会 研究報告, NL163-23 / FI76-23 (2004.9)
This paper proposes a case transition network model to provide a framework for representing case order information in addition to a Japanese case frame. The model is regarded as an extension of bi-gram model employing a case element as a unit. A preliminary investigation of the model leads us to the conclusions that the transition network has suffi...
In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sente...
In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sente...
This paper investigates honorific phenomena on two variants of Korean translation corpus, based on translations from Japanese and English. One surprising result is how di#erent the corpora were, even after normalizing orthographic di#erences. Translations are dependent not just meaning, but also on the structure of the source text.
白京姫, 大竹清敬, 山本 和英. 異なる原言語からの翻訳による同義表現の分析-韓国語の例-. 言語処理学会第10回年次大会, pp.169-172 (2004.3)
大橋 一輝, 山本 和英. 「サ変動詞+名詞」の複合名詞への換言. 言語処理学会第10回年次大会, pp.693-696 (2004.3)
峠 泰成, 山本 和英. 手がかり語自動取得によるWeb掲示板からの評価文抽出. 言語処理学会第10回年次大会, pp.107-110 (2004.3)
This paper investigates honoric phenomena on two variants of Korean translation corpus, based on translations from Japanese and English. One surprising result is how dieren t the corpora were, even after normalizing orthographic dierences. Translations are dependent not just meaning, but also on the structure of the source text.
One of the problems in spoken language translation is the enormous variety of ex-pressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we propose a machine translation mod-el where an input is translated through both source-language and target-language paraphrasing processes...
We propose a detection method for orthographic variants caused by transliteration in a large corpus. The method employs two similarities. One is string similarity based on edit distance. The other is contextual similarity by a vector space model. Experimental results show that the method performed a 0.889 F-measure in an open test.
Two kinds of paraphrases extracted from a bilingual parallel corpus were analyzed. One is from an adjectival predicate sentence to a non-adjectival one. The other is from a passive form to a non-passive form. The ability to extract paraphrases is strongly desired for paraphrasing studies. Although extracting paraphrases from multi-lingual parallel...
関口 洋一, 山本 和英. Webコーパスの提案. 情報処理学会 研究報告, NL157-17 / FI72-17, pp.123-130 (2003.9)
安達 康昭, 山本 和英. 特徴的冗長表現に着目した国会会議録要約. 情報処理学会 研究報告, NL157-15 / FI72-15, pp.107-114 (2003.9)
This paper presents a speech summarizer that summarizes input speech via several prosodic features, unlike models that use a speech recognizer and conventional summarizing techniques proposed in natural language processing. Our approach analyzes the borders of summary units by employing prosodic features of pitch, power, and pause to summarize the...
吉田 辰巳,大竹 清敬, 山本 和英. サポートベクトルマシンを用いた中国語解析実験. 自然言語処理, Vol.10, No.1, pp.109-131 , 言語処理学会 (2003.1)
We will report performances of currently and publicly available Chinese analyzers and resources. We use YamCha, a tool based on Support Vector Machines, and the Penn Chinese Treebank as a language resource. Combining these two, we measure the performances of Chinese analysis, i. e., word segmentation, part-of-speech tagging, and base phrase chunkin...
In this paper we propose a corpus-based approach to anaphora resolution combining a machine learning method and statistical information. First, a decision tree trained on an annotated corpus determines the coreference relation of a given anaphor and antecedent candidates and is utilized as a filter in order to reduce the num- ber of potential candi...
Few rock groups of the '80s broke down as many musical barriers and were as original as the Red Hot Chili Peppers. Creating an intoxicating new musical style by combining funk and punk rock together (with an explosive stage show, to boot), the Chili Peppers spawned a slew of imitators in their wake, but still managed to be the leaders of the p...
酒井 浩之,篠原 直嗣,増山 繁, 山本 和英. 連用修飾表現の省略可能性に関する知識の獲得. 自然言語処理, Vol.9, No.3, pp.41-62 , 言語処理学会 (2002.7)
吉田 辰巳, 大竹 清敬, 山本 和英. 中国語形態素解析に対するSVMとコスト最小法の比較実験. 情報処理学会 研究報告, NL150-23, pp.157-162 (2002.7)
This paper introduces an attempt at collecting a corpus of various usages of Japanese predicates and synonymous expressions in English. We have learned that an effective consideration to exhaus- tively collect such various usages is to continue to create new sentences until no more sentences can be conceived within one language. We have found that...
A method for resolving the ellipses that appear in Japanese dialogues is proposed. This method resolves not only the subject ellipsis, but also those in object and other grammatical cases. In this approach, a machine-learning algorithm is used to select the attributes necessary for a resolution.
宮木衛, 増山繁, 山本 和英. 2名詞による連体修飾語の換言可能性に関する考察. 言語処理学会第8回年次大会, pp.136-139 (2002.3)
山本 和英. 換言と言語変換の協調による機械翻訳モデル. 言語処理学会第8回年次大会, pp.307-310 (2002.3)
張玉潔, 山本 和英, 坂本仁. 換言コーパスを利用した中国語換言処理. 言語処理学会第8回年次大会, pp.132-135 (2002.3)
Since the expansion of MT knowledge is currently being performed by humans, it is taking too long and is too expensive. This paper proposes a new procedure that expands MT knowledge e#ciently by supporting human judgements with information automatically collected from any number of corpora. The new procedure uses the source knowledge present in an...
One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper, a pattern-based approach to par...
Automatic acquisition of paraphrase knowledge for content words is proposed. Using only a non-parallel text corpus, we compute the para-phrasability metrics between two words from their similarity in context. We then filter words such as proper nouns from external knowledge. Finally, we use a heuristic in further filtering to improve the accuracy o...
We propose a thesaurus of predicates that can help to resolve pre-editing and/or post-editing problems in machine translation environ-ments. It differs from earlier approaches such as conventional dictionaries in that we are aiming to link a wide range of near-synonyms and paraphrases. We are compiling such similar examples through both introspecti...
This paper reports on a paraphras-ing method for Japanese honorifics. Japanese honorific expressions, as seen in real world dialogs, have many forms of identical meanings. This paper discusses a paraphrasing method that simplifies each utterance by removing honorifics. To simplify an utterance, we take a prac-tical approach: investigate a corpus, a...
Chengqing Zong, Yujie Zhang, Kazuhide Yamamoto, Masashi Sakamoto and Satoshi Shirai. Paraphrasing Chinese Utterances in Spoken Language Translation System. Proceedings of International Conference on Chinese Computing (ICCC2001), pp.395-401 (2001.11)
This paper presents an approach to spoken Chinese language paraphrasing basedonfeatureextractionand techniques of language generation. In this approach, an input utterance is first analyzed in terms of phrase structure, dependency of chunks, etc., by using multiple methods. Then, the main features of the input utterance are extracted, and the extra...
In translation between languages that have different linguistic characteristics like Japanese and English, there are many cases in which contents are not correctly transmitted in the substitution from word to word. A method known to be effective as a measure for this is to determine the translations of verbs and nouns by using valency pattern pairs...
In developing a machine translation system, one of the difficult tasks is how to build a transfer dictionary. It has been built by human labor from scratch in most cases. This approach, however, is very ineffective from the viewpoint of cost and time. To avoid this problem, we generate a Korean to Japanese dictionary as a sample, taking advantage o...
Any machine translation system requires a transfer dictionary between the source and target languages. Typically, since the construction of such a dictionary is done by hand, a lot of time is taken and the cost is enormous. Considering this, we attempted the construction of a bilingual dictionary through the re-generation of already-existing langua...
This paper proposes a new machine translation design that is the core architecture in an on-going. Although GIST is conceptually natural, an English paraphraser was constructed to generate natural language interpretations
古瀬 蔵, 山田 節夫, 山本 和英. 頑健な多言語音声翻訳のための不適格入力の分割処理. 情報処理学会論文誌, Vol.42, No.5, pp.1223-1231, 情報処理学会 (2001.5)
大竹 清敬, 児玉充, 増山繁, 山本 和英. 多重修飾された名詞句からの換言事例の自動収集. 言語処理学会第7回年次大会併設ワークショップ, pp.51-54 (2001.3)
白井諭, 山本 和英. 換言事例の収集 -機械翻訳における多様性確保の観点から-. 言語処理学会第7回年次大会併設ワークショップ, pp.3-8 (2001.3)
白京姫, 白井諭, 山本 和英, 坂本仁. 言語的類似性を利用する日韓音声翻訳の検討. 言語処理学会第7回年次大会, pp.225-228 (2001.3)
大竹清敬, 増山繁, 山本 和英. コーパスからの格要素列獲得における多義性への対応. 言語処理学会第7回年次大会, pp.502-505 (2001.3)
白井諭, 山本 和英, 白京姫. 対訳辞書作成のための英訳辞書の照合. 電子情報通信学会技術研究報告, TL2000-36 (2001.3)
山本 和英, 白井諭, 坂本仁, 張玉潔. Sandglass: 両言語換言機構を基軸とする音声翻訳. 言語処理学会第7回年次大会, pp.221-224 (2001.3)
白井諭, 山本 和英. 換言事例の収集 ―日英基本構文を対象として―. 言語処理学会第7回年次大会, pp.401-404 (2001.3)
張玉潔, 山本 和英, 坂本仁. 中日音声翻訳のための中国語換言処理の分析. 言語処理学会第7回年次大会, pp.476-479 (2001.3)
佐渡詩郎, 大竹清敬, 増山繁, 山本 和英, 中川聖一. ニュース文の音声要約のための韻律情報の利用. 情報処理学会研究報告, NL140-4, pp.23-30 (2000.11)
We propose a method of paraphrasing a Japanese noun modifier into a noun phrase in the form of "A no B." The semantic structures of "A no B" are sometimes recognized by supplementing some abbreviated predicate. We define these abbreviated verbs as "deletable verbs" in twoways: 1. Wechoose verbs matched with the semantic relations of "A no B" by usi...
A compound noun and its translation do not always have a correspondence with each other in part-by-part basis. Therefore, there are cases where utilizing the translations of the constituent words for extracting the translation of the compound noun is ineffective. We propose a method which copes with this defect. At first, it detects the parts of th...