Kazuhide Yamamoto

Kazuhide Yamamoto
Nagaoka University of Technology · Department of Electrical, Electronics and Information Engineering

Dr.Eng.

About

154
Publications
32,642
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
565
Citations
Introduction
1991-1996 Toyohashi Univ. of Tech. (as a student) 1996 Dr.Eng 1996-2002 ATR research labs. 2002-current Nagaoka Univ. of Tech. 2016-current board member, Association for NLP, Japan. Research interest: paraphrasing, simplification, and lexical resource construction.
Additional affiliations
October 2002 - present
October 2002 - present
Nagaoka University of Technology
Position
  • Professor (Associate)

Publications

Publications (154)
Conference Paper
Full-text available
The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese p...
Conference Paper
Full-text available
This paper examines the introduction of "Easy Japanese" by extracting important segments for translation. The need for Japanese language has increased dramatically due to the recent influx of non-Japanese-speaking foreigners. Therefore, in order for non-native speakers of Japanese to successfully adapt to society, the so-called Easy Japanese is bei...
Conference Paper
Full-text available
The automatic insertion of diacritics in electronic texts is necessary for a number of languages, including French, Romanian, Croatian, Sindhi, Vietnamese, etc. When diacritics are removed from a word and the resulting string of characters is not a word, it is easy to recover the diacritics. However, sometimes the resulting string is also a word, p...
Conference Paper
Full-text available
In machine translation (MT), modality errors are often critical. We propose a phrase-based statistical MT method that preserves the modality of input sentences. The method introduces a feature function that counts the number of phrases in a sentence that are characteristic words for modalities. This simple method increases the number of translation...
Article
Full-text available
We have built a Japanese large-scale general ontology restructured from Wikipedia, that represents a is-a relation hierarchy. A Wikipedia’s article page belongs to one or more categories that are organized hierarchically by linking to others. However, there are the following two issues to be solved in order to use the categories and the articles as...
Conference Paper
Full-text available
We have been proposed Syntactic Piece, an unit for shallow language processing. A piece consists of a pair of modifier and modificand, derived from syntactic structure, and two expressions which differ slightly but having the same meaning are represented as the same piece in this framework. In this paper we report (a) reconsideration of creation pr...
Conference Paper
Full-text available
Finding pages on the Web that are similar to a query page is an important component of modern search engines. Especially recognition method of content about Web pages is important role in search engine. However, if Web page include query words, it does not necessarily mean that Web page describe query. The main challenge here is identification fact...
Conference Paper
Full-text available
Automatic bilingual term extraction is essen-tial for providing a consistent bilingual term list for human translators engaged in translat-ing a set of documents. We compare three sta-tistical measures for extracting bilingual terms from a phrase-table built from a parallel cor-pus. We show that these measures extract dif-ferent bilingual term cand...
Article
Full-text available
In this study, we extracted articles describing problems, articles describing their solutions, and articles describing their causes from a Japanese Q&A styleWeb forum using a supervised machine learning with 0.70, 0.86, and 0.56 F values, respectively. We confirmed that these values are significantly better than their baselines. This extraction wil...
Data
Full-text available
Automatic bilingual term extraction is essen-tial for providing a consistent bilingual term list for human translators engaged in translat-ing a set of documents. We compare three sta-tistical measures for extracting bilingual terms from a phrase-table built from a parallel cor-pus. We show that these measures extract dif-ferent bilingual term cand...
Conference Paper
Full-text available
We propose a method to detect Japanese nasty comments from posts on bulletin board systems (BBS). Nasty comments can cause many social problem, because they express potentially harmful words and phrases. There are methods to recognize harmful words, but they are insufficient. Therefore, we present a method for detecting such comments on a BBS with...
Conference Paper
Full-text available
This paper presents a new computation of lexical distributional similarity, which is a corpus-based method for computing similarity of any two words. Although the conventional method focuses on em-phasizing features with which a given word is associated, we propose that even unassociated features of two input words can further improve the performan...
Conference Paper
Full-text available
We present a novel method for build-ing a large-scale Japanese ontology from Wikipedia using one of the largest Japanese thesauri, Nihongo Goi-Taikei (referred to hereafter as "Goi-Taikei") as an upper ontology. First, The leaf cat-egories in the Goi-Taikei hierarchy are semi-automatically aligned with seman-tically equivalent Wikipedia categories....
Conference Paper
Full-text available
This paper presents a method for con-structing a large-scale Person Ontology with category hierarchy from Wikipe-dia. We first extract Wikipedia category labels which represent person (hereafter, Wikipedia Person Category, WPC) by using a machine learning classifier. We then construct a WPC hierarchy by de-tecting is-a relations in the Wikipedia ca...
Conference Paper
Full-text available
In this study, our purpose was to make a short summary for sentences. For ex-ample, we aimed to make a short summary "terror" for sentences "A bomb went off. Some people were killed. This was triggered by rebel campaign." In this study, we proposed a new method that generates summaries that can appropriately and adequately express the contents of t...
Conference Paper
Full-text available
To address the shortage of Japanese-English parallel corpora, we developed a parallel cor-pus by collecting open source software man-uals from the Web. The constructed cor-pus contains approximately 500 thousand sen-tence pairs that were aligned automatically by an existing method. We also conducted statis-tical machine translation (SMT) experiment...
Conference Paper
Full-text available
It is expensive for companies to browse daily reports. Our aim is to create a system that extracts information about problems from reports. This system operates in two steps. First, it records expressions involving troubles in a dictionary from training data. Second, it expands the dictionary to include information not included in the training data...
Conference Paper
Full-text available
This paper presents a method for generating reviews of stories. In this work, we focus on generating sentences that include subjective expressions. First, we constructed lexicons using emotion-emerged expressions that are thought to be the origin of the emotional content. The lexicon consists of syntactic pieces that are proposed as units of syntac...
Article
Full-text available
Automatic summarization is an important task as a form of human support technology. We propose in this paper a new summariza-tion method that is based on example-based approach. Using example-based approach for the summarization task has the following three advantages: high modularity, absence of the necessity to score importance for each word, and...
Article
ある入力文書が多くの人にとってどの程度興味や関心を持つかを算出する指標を提案する. 各個人の興味や関心は多種多様であり, これを把握することで情報のフイルタリング等を行う研究は知られているが, 本研究では不特定多数すなわち大衆が全体でどの程度の興味を持つかについて検討を行った. このような技術は, 不特定多数に対して閲覧されることを想定しているWebサイトにおける提示文書の選択や表示頂の変更など, 非常に重要な応用分野を持っている. 我々は大衆の興味が反映されている情報源として順位付き文書を使用した. 本手法ではこれを学習データとして利用して, 文書に含まれる語句及び文書自体に興味の強弱を値として付与する手法を構築した. 興味を値として扱うことで, 興味の強弱を興味がある・ないの2値ではなく...
Article
Full-text available
This paper addresses a task of opinion extraction from given documents and its positive/negative classification. We propose a sentence classification method using a notion of syntactic piece. Syntactic piece is a minimum unit of structure, and is used as an alternative processing unit of n-gram and whole tree structure. We compute its semantic orie...
Conference Paper
Full-text available
This paper describes a system which identifies discourse relations between two successive sentences in Japanese. On top of the lexical information previously proposed, we used phrasal pattern information. Adding phrasal information improves the system's accuracy 12%, from 53% to 65%.
Conference Paper
Full-text available
In this paper, we present a novel global re- ordering model that can be incorporated into standard phrase-based statistical ma- chine translation. Unlike previous local reordering models that emphasize the re- ordering of adjacent phrase pairs (Till- mann and Zhang, 2005), our model ex- plicitly models the reordering of long dis- tances by directly...
Article
One of the key issues in spoken-language translation is how to deal with unrestricted expressions in spontaneous utterances. We have developed a paraphraser for use as part of a translation system, and in this paper we describe the implementation of a Chinese paraphraser for a Chinese-Japanese spoken-language translation system. When an input sente...
Article
Full-text available
山本 和英,池田 諭史, 大橋 一輝. 「新幹線要約」のための文末の整形. 自然言語処理, Vol.12, No.6, pp.85-111 , 言語処理学会 (2005.11)
Article
Full-text available
白 京姫,大竹 清敬,BOND FRANCIS, 山本 和英. 原言語が異なる翻訳コーパスの定量的分析. 自然言語処理, 特集号「コーパス言語学・言語教育と言語処理」, Vol.12, No.4, pp.117-136 , 言語処理学会 (2005.8)
Article
Full-text available
山本 和英, 大橋 一輝. 「サ変動詞+名詞」の複合名詞への換言. 自然言語処理, Vol.12, No.3, pp.19-42 , 言語処理学会 (2005.7)
Conference Paper
Full-text available
峠 泰成, 大橋 一輝, 山本 和英. ドメイン特徴語の自動取得によるWeb掲示板からの意見文抽出. 言語処理学会第11回年次大会, pp.672-675 (2005.3)
Conference Paper
Full-text available
土田 雅之, 大橋 一輝, 山本 和英. 時制と態を考慮したサ変名詞の動詞化. 言語処理学会第11回年次大会, pp.209-212 (2005.3)
Conference Paper
Full-text available
大橋 一輝, 山本 和英, 齋藤 邦子, 永田 昌明. 句に基づく統計翻訳における語句の並べ替えパターンの分析. 言語処理学会第11回年次大会, pp.863-866 (2005.3)
Article
Full-text available
News on electrical bulletin boards con-sist of high density expressions. Many sentences end with unique expressions that consist of nouns and case parti-cles. This paper focuses on expressions used at the end of sentences and at-tempts to summarize them by forming noun or case particle endings. We sum-marize the news sentence through pat-tern match...
Article
Full-text available
In this paper, we present a novel distortion model for phrase-based statistical machine translation. Unlike the pre-vious phrase distortion models whose role is to simply penal-ize nonmonotonic alignments[1, 2], the new model assigns the probability of relative position between two source lan-guage phrases aligned to the two adjacent target languag...
Article
Full-text available
This paper describes a system which solves language tests for second grade students (7 years old). In Japan, there are materials for students to measure understanding of what they studied, just like SAT for high school students in US. We use textbooks for the stu-dents as the target material of this study. Questions in the materials are classified...
Article
Full-text available
山本 和英, 安達 康昭. 国会会議録を対象とする話し言葉要約. 自然言語処理, Vol.12, No.1, pp.51-78 , 言語処理学会 (2005.1)
Conference Paper
Full-text available
峠 泰成, 大橋 一輝, 山本 和英. 繰り返し学習を用いた話題に順応する意見文抽出. 情報処理学会 研究報告, FI77-5 (2004.11)
Conference Paper
Full-text available
池田 諭史, 大橋 一輝, 山本 和英. 「新幹線要約」のための文末の整形. 情報処理学会 研究報告, NL163-22 / FI76-22 (2004.9)
Conference Paper
Full-text available
沢井 康孝, 峠 泰成, 山本 和英. 順位付け文書からの影響因子マイニング. 情報処理学会 研究報告, NL163-23 / FI76-23 (2004.9)
Article
Full-text available
This paper proposes a case transition network model to provide a framework for representing case order information in addition to a Japanese case frame. The model is regarded as an extension of bi-gram model employing a case element as a unit. A preliminary investigation of the model leads us to the conclusions that the transition network has suffi...
Article
In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sente...
Conference Paper
Full-text available
In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sente...
Article
Full-text available
This paper investigates honorific phenomena on two variants of Korean translation corpus, based on translations from Japanese and English. One surprising result is how di#erent the corpora were, even after normalizing orthographic di#erences. Translations are dependent not just meaning, but also on the structure of the source text.
Conference Paper
Full-text available
白京姫, 大竹清敬, 山本 和英. 異なる原言語からの翻訳による同義表現の分析-韓国語の例-. 言語処理学会第10回年次大会, pp.169-172 (2004.3)
Conference Paper
Full-text available
大橋 一輝, 山本 和英. 「サ変動詞+名詞」の複合名詞への換言. 言語処理学会第10回年次大会, pp.693-696 (2004.3)
Thesis
Full-text available
安達 康昭. 国会会議録に対する文短縮による報知的要約手法. 長岡技術科学大学課題研究報告書 (2004.3)
Conference Paper
Full-text available
峠 泰成, 山本 和英. 手がかり語自動取得によるWeb掲示板からの評価文抽出. 言語処理学会第10回年次大会, pp.107-110 (2004.3)
Conference Paper
Full-text available
This paper investigates honoric phenomena on two variants of Korean translation corpus, based on translations from Japanese and English. One surprising result is how dieren t the corpora were, even after normalizing orthographic dierences. Translations are dependent not just meaning, but also on the structure of the source text.
Article
Full-text available
One of the problems in spoken language translation is the enormous variety of ex-pressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we propose a machine translation mod-el where an input is translated through both source-language and target-language paraphrasing processes...
Article
Full-text available
We propose a detection method for orthographic variants caused by transliteration in a large corpus. The method employs two similarities. One is string similarity based on edit distance. The other is contextual similarity by a vector space model. Experimental results show that the method performed a 0.889 F-measure in an open test.
Article
Full-text available
Two kinds of paraphrases extracted from a bilingual parallel corpus were analyzed. One is from an adjectival predicate sentence to a non-adjectival one. The other is from a passive form to a non-passive form. The ability to extract paraphrases is strongly desired for paraphrasing studies. Although extracting paraphrases from multi-lingual parallel...
Conference Paper
Full-text available
関口 洋一, 山本 和英. Webコーパスの提案. 情報処理学会 研究報告, NL157-17 / FI72-17, pp.123-130 (2003.9)
Conference Paper
Full-text available
安達 康昭, 山本 和英. 特徴的冗長表現に着目した国会会議録要約. 情報処理学会 研究報告, NL157-15 / FI72-15, pp.107-114 (2003.9)
Article
Full-text available
This paper presents a speech summarizer that summarizes input speech via several prosodic features, unlike models that use a speech recognizer and conventional summarizing techniques proposed in natural language processing. Our approach analyzes the borders of summary units by employing prosodic features of pitch, power, and pause to summarize the...
Article
Full-text available
吉田 辰巳,大竹 清敬, 山本 和英. サポートベクトルマシンを用いた中国語解析実験. 自然言語処理, Vol.10, No.1, pp.109-131 , 言語処理学会 (2003.1)
Article
We will report performances of currently and publicly available Chinese analyzers and resources. We use YamCha, a tool based on Support Vector Machines, and the Penn Chinese Treebank as a language resource. Combining these two, we measure the performances of Chinese analysis, i. e., word segmentation, part-of-speech tagging, and base phrase chunkin...
Article
Full-text available
In this paper we propose a corpus-based approach to anaphora resolution combining a machine learning method and statistical information. First, a decision tree trained on an annotated corpus determines the coreference relation of a given anaphor and antecedent candidates and is utilized as a filter in order to reduce the num- ber of potential candi...
Article
Full-text available
Few rock groups of the '80s broke down as many musical barriers and were as original as the Red Hot Chili Peppers. Creating an intoxicating new musical style by combining funk and punk rock together (with an explosive stage show, to boot), the Chili Peppers spawned a slew of imitators in their wake, but still managed to be the leaders of the p...
Article
Full-text available
酒井 浩之,篠原 直嗣,増山 繁, 山本 和英. 連用修飾表現の省略可能性に関する知識の獲得. 自然言語処理, Vol.9, No.3, pp.41-62 , 言語処理学会 (2002.7)
Conference Paper
Full-text available
吉田 辰巳, 大竹 清敬, 山本 和英. 中国語形態素解析に対するSVMとコスト最小法の比較実験. 情報処理学会 研究報告, NL150-23, pp.157-162 (2002.7)
Article
Full-text available
This paper introduces an attempt at collecting a corpus of various usages of Japanese predicates and synonymous expressions in English. We have learned that an effective consideration to exhaus- tively collect such various usages is to continue to create new sentences until no more sentences can be conceived within one language. We have found that...
Article
A method for resolving the ellipses that appear in Japanese dialogues is proposed. This method resolves not only the subject ellipsis, but also those in object and other grammatical cases. In this approach, a machine-learning algorithm is used to select the attributes necessary for a resolution.
Conference Paper
Full-text available
大竹清敬, 山本 和英. 形容詞述語文の換言事例の分析. 言語処理学会第8回年次大会, pp.319-322 (2002.3)
Conference Paper
Full-text available
宮木衛, 増山繁, 山本 和英. 2名詞による連体修飾語の換言可能性に関する考察. 言語処理学会第8回年次大会, pp.136-139 (2002.3)
Conference Paper
Full-text available
山本 和英. 換言と言語変換の協調による機械翻訳モデル. 言語処理学会第8回年次大会, pp.307-310 (2002.3)
Conference Paper
Full-text available
張玉潔, 山本 和英, 坂本仁. 換言コーパスを利用した中国語換言処理. 言語処理学会第8回年次大会, pp.132-135 (2002.3)
Conference Paper
Full-text available
山本 和英. テキストからの語彙的換言知識の獲得. 言語処理学会第8回年次大会, pp.639-642 (2002.3)
Article
Full-text available
Since the expansion of MT knowledge is currently being performed by humans, it is taking too long and is too expensive. This paper proposes a new procedure that expands MT knowledge e#ciently by supporting human judgements with information automatically collected from any number of corpora. The new procedure uses the source knowledge present in an...
Conference Paper
Full-text available
One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper, a pattern-based approach to par...
Article
Full-text available
Automatic acquisition of paraphrase knowledge for content words is proposed. Using only a non-parallel text corpus, we compute the para-phrasability metrics between two words from their similarity in context. We then filter words such as proper nouns from external knowledge. Finally, we use a heuristic in further filtering to improve the accuracy o...
Article
Full-text available
We propose a thesaurus of predicates that can help to resolve pre-editing and/or post-editing problems in machine translation environ-ments. It differs from earlier approaches such as conventional dictionaries in that we are aiming to link a wide range of near-synonyms and paraphrases. We are compiling such similar examples through both introspecti...
Conference Paper
Full-text available
This paper reports on a paraphras-ing method for Japanese honorifics. Japanese honorific expressions, as seen in real world dialogs, have many forms of identical meanings. This paper discusses a paraphrasing method that simplifies each utterance by removing honorifics. To simplify an utterance, we take a prac-tical approach: investigate a corpus, a...
Conference Paper
Full-text available
Chengqing Zong, Yujie Zhang, Kazuhide Yamamoto, Masashi Sakamoto and Satoshi Shirai. Paraphrasing Chinese Utterances in Spoken Language Translation System. Proceedings of International Conference on Chinese Computing (ICCC2001), pp.395-401 (2001.11)
Article
Full-text available
This paper presents an approach to spoken Chinese language paraphrasing basedonfeatureextractionand techniques of language generation. In this approach, an input utterance is first analyzed in terms of phrase structure, dependency of chunks, etc., by using multiple methods. Then, the main features of the input utterance are extracted, and the extra...
Article
Full-text available
In translation between languages that have different linguistic characteristics like Japanese and English, there are many cases in which contents are not correctly transmitted in the substitution from word to word. A method known to be effective as a measure for this is to determine the translations of verbs and nouns by using valency pattern pairs...
Article
Full-text available
In developing a machine translation system, one of the difficult tasks is how to build a transfer dictionary. It has been built by human labor from scratch in most cases. This approach, however, is very ineffective from the viewpoint of cost and time. To avoid this problem, we generate a Korean to Japanese dictionary as a sample, taking advantage o...
Article
Full-text available
Any machine translation system requires a transfer dictionary between the source and target languages. Typically, since the construction of such a dictionary is done by hand, a lot of time is taken and the cost is enormous. Considering this, we attempted the construction of a bilingual dictionary through the re-generation of already-existing langua...
Article
Full-text available
This paper proposes a new machine translation design that is the core architecture in an on-going. Although GIST is conceptually natural, an English paraphraser was constructed to generate natural language interpretations
Article
Full-text available
古瀬 蔵, 山田 節夫, 山本 和英. 頑健な多言語音声翻訳のための不適格入力の分割処理. 情報処理学会論文誌, Vol.42, No.5, pp.1223-1231, 情報処理学会 (2001.5)
Conference Paper
Full-text available
山本 和英. 換言処理の現状と課題. 言語処理学会第7回年次大会併設ワークショップ, pp.93-96 (2001.3)
Conference Paper
Full-text available
大竹 清敬, 児玉充, 増山繁, 山本 和英. 多重修飾された名詞句からの換言事例の自動収集. 言語処理学会第7回年次大会併設ワークショップ, pp.51-54 (2001.3)
Conference Paper
Full-text available
白井諭, 山本 和英. 換言事例の収集 -機械翻訳における多様性確保の観点から-. 言語処理学会第7回年次大会併設ワークショップ, pp.3-8 (2001.3)
Conference Paper
Full-text available
白京姫, 白井諭, 山本 和英, 坂本仁. 言語的類似性を利用する日韓音声翻訳の検討. 言語処理学会第7回年次大会, pp.225-228 (2001.3)
Conference Paper
Full-text available
大竹清敬, 増山繁, 山本 和英. コーパスからの格要素列獲得における多義性への対応. 言語処理学会第7回年次大会, pp.502-505 (2001.3)
Conference Paper
Full-text available
白井諭, 山本 和英, 白京姫. 対訳辞書作成のための英訳辞書の照合. 電子情報通信学会技術研究報告, TL2000-36 (2001.3)
Conference Paper
Full-text available
山本 和英, 白井諭, 坂本仁, 張玉潔. Sandglass: 両言語換言機構を基軸とする音声翻訳. 言語処理学会第7回年次大会, pp.221-224 (2001.3)
Conference Paper
Full-text available
白井諭, 山本 和英. 換言事例の収集 ―日英基本構文を対象として―. 言語処理学会第7回年次大会, pp.401-404 (2001.3)
Conference Paper
Full-text available
張玉潔, 山本 和英, 坂本仁. 中日音声翻訳のための中国語換言処理の分析. 言語処理学会第7回年次大会, pp.476-479 (2001.3)
Conference Paper
Full-text available
佐渡詩郎, 大竹清敬, 増山繁, 山本 和英, 中川聖一. ニュース文の音声要約のための韻律情報の利用. 情報処理学会研究報告, NL140-4, pp.23-30 (2000.11)
Article
Full-text available
We propose a method of paraphrasing a Japanese noun modifier into a noun phrase in the form of "A no B." The semantic structures of "A no B" are sometimes recognized by supplementing some abbreviated predicate. We define these abbreviated verbs as "deletable verbs" in twoways: 1. Wechoose verbs matched with the semantic relations of "A no B" by usi...
Article
Full-text available
A compound noun and its translation do not always have a correspondence with each other in part-by-part basis. Therefore, there are cases where utilizing the translations of the constituent words for extracting the translation of the compound noun is ineffective. We propose a method which copes with this defect. At first, it detects the parts of th...

Network

Cited By