Kosho Shudo’s research while affiliated with Fukuoka University and other places


Ad

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing
  • Article

January 2013

·

62 Reads

·

7 Citations

Computer Speech & Language

Toshifumi Tanabe

·

Masahito Takahashi

·

Kosho Shudo

Since Sag et al. (2002) highlighted a key problem that had been underappreciated in the past in natural language processing (NLP), namely idiosyncratic multiword expressions (MWEs) such as idioms, quasi-idioms, clichés, quasi-clichés, institutionalized phrases, proverbs and old sayings, and how to deal with them, many attempts have been made to extract these expressions from corpora and construct a lexicon of them. However, no extensive, reliable solution has yet been realized. This paper presents an overview of a comprehensive lexicon of Japanese multiword expressions (Japanese MWE Lexicon: JMWEL), which has been compiled in order to realize linguistically precise and wide-coverage natural Japanese processing systems. The JMWEL is characterized by significant notational, syntactic, and semantic diversity as well as a detailed description of the syntactic functions, structures, and flexibilities of MWEs. The lexicon contains about 111,000 header entries written in kana (phonetic characters) and their almost 820,000 variants written in kana and kanji (ideographic characters). The paper demonstrates the JMWEL's validity, supported mainly by comparing the lexicon with a large-scale Japanese N-gram frequency dataset, namely the LDC2009T08 generated by Google Inc. (Kudo and Kazawa, 2009). The present work is an attempt to provide a tentative answer for Japanese, from outside statistical empiricism, to the question posed by Church (2011): “How many multiword expressions do people know?”

Share

A Comprehensive Dictionary of Multiword Expressions.

January 2011

·

87 Reads

·

16 Citations

It has been widely recognized that one of the most difficult and intriguing problems in natural language processing (NLP) is how to cope with idiosyncratic multiword expressions. This paper presents an overview of the comprehensive dictionary (JDMWE) of Japanese multiword expressions. The JDMWE is characterized by a large notational, syntactic, and semantic diversity of contained expressions as well as a detailed description of their syntactic functions, structures, and flexibilities. The dictionary contains about 104,000 expressions, potentially 750,000 expressions. This paper shows that the JDMWE's validity can be supported by comparing the dictionary with a large-scale Japanese N-gram frequency dataset, namely the LDC2009T08, generated by Google Inc. (Kudo et al. 2009).


JDMWE: A Japanese Dictionary of Multi-Word Expressions
  • Article
  • Full-text available

January 2010

·

24 Reads

·

4 Citations

Journal of Natural Language Processing

Since (Sag et al. 2002) is presented, the NLP society has been aware that one of the most crucial problems in NLP is how to cope with idiosyncratic multiword expressions, which occur in authentic sentences with unexpectedly high frequency. Here, the idiosyncrasy of expression is twofold in principle; one is idiomaticity, i.e. non-compositionality of meaning and the other is the strong probabilistic boundness of word combination. Thus, many trials to extract those expressions from corpora by using mostly statistical method have been made in NLP field. However, presumably because of the difficulty with their correct extraction without human insight, no reliable, extensive resource has yet been available. Authors recognized the crucial importance of such irregular expressions in around 1970 and started to develop a machine dictionary which contains Japanese idioms, idiom-like expressions and other multiword expressions which consist of frequently co-occurring words. In this paper, we give an overview of the first version of the dictionary, namely JDMWE (Japanese Dictionary of Multi-Word Expressions). It has about 104,000 head entries and is characterized by; 1. the wide notational, syntactic and semantic variety of contained expressions, 2. the syntactic function and structure given for each entry expression and 3. the possibility of internal modification indicated for each component word of the entry expression.

Download

MWEs as non-propositional content indicators

January 2004

·

20 Reads

·

27 Citations

We report that a proper employment of MWEs concerned enables us to put forth a tractable framework, which is based on a multiple nesting of semantic operations, for the processing of non-inferential, Non-propositional Contents (NPCs) of natural Japanese sentences. Our framework is characterized by its broad syntactic and semantic coverage, enabling us to deal with multiply composite modalities and their semantic/pragmatic similarity. Also, the relationship between indirect (Searle, 1975) and direct speech, and equations peculiar to modal logic and its family (Mally, 1926; Prior, 1967) are treated in the similarity paradigm.


Processing Homonyms in the Kana-to-Kanji Conversion

January 2003

·

19 Reads

Tills paper proI)oses two new methods to identify tile correct meaning of Japanese homonyms in text based on the noumvcrb co-occurrence in a SPillChOP which can be obtained easily from corpora. The first method uses the near co-occurrence data sets, which are strutted from tile above (:o-occurrence relation, to select the most femiblc word muong homonyms in the scope of r sen- renee. The second uses the fivr cooccurrence data sets, which arc strutted dynamically h'om the near co- occurrence data sets in the course of processing input sentences, to select the most feasible word among homonyms in the scope of a sequence of sentences. An experiment of kana-to-kanji(t)honogramto -ideograph) conversion has shown that the conversion is carried out at the curimy rate of 79.6% per word by the first method. This accm'acy rate of our method is 7.4% higher than that of the ordinary method based on the word currenee h'equency.


Morphological Aspect Of Japanese Language Processing

January 2003

·

35 Reads

·

6 Citations

this paper, the above mentioned grammatical model for the first phase of parsing, which may be called "pseudo-morphological" phase, is shown and the experimental system developed for the verification of its validity is outlined. After showing some operational examples and the result of the experiment, we conclude that our model is quite effective for Japanese language processing from the standpoints mentioned above



Modality Expressions in Japanese and Their Automatic Paraphrasing.

January 2001

·

30 Reads

·

5 Citations

It is important for future NLP systems to formulate the semantic equivalence (and more generally, the semantic similarity) of natural language expressions. In particular, paraphrasing, full text information retrieval, example-based MT and document compression technology require the effective equivalence criterion for linguistic expressions. In this paper, first, we discuss the meaning of Japanese sentence-final modality expressions (ME) and second, present equivalence rules for paraphrasing a string of MEs while preserving its meaning.


Large scale collocation data and their application to Japanese word processor technology

January 1998

·

6 Reads

·

5 Citations

Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. It is shown that approximately 135,000 collocations yield 9.1 % raise of the conversion accuracy compared with the prototype system which has no collocation data.


Processing Homonyms in the Kana-to-Kanji Conversion.

January 1996

·

21 Reads

·

2 Citations

This paper proposes two new methods to identify the correct meaning of Japanese homonyms in text based on the noun-verb co-occurrence in a sentence which can be obtained easily from corpora. The first method uses the (phonogram-to-ideograph) conversion has shown that the conversion is carried out at the accuracy rate of 79.6% per word by the first method. This accuracy rate of our method is 7.4% higher than that of the ordinary method based on the word occurrence frequency.


Ad

Citations (8)


... They emphasized that expressions convey strong sentiments, even though gathering and annotating them takes time. This study [36] presents a glossary of Japanese vocabulary used in idiomatic expressions. It comprises quasi-idioms as well as well-known Japanese idioms and clichés. ...

Reference:

Applying English Idiomatic Expressions to Classify Deep Sentiments in COVID-19 Tweets
JDMWE: A Japanese Dictionary of Multi-Word Expressions

Journal of Natural Language Processing

... Grammar cannot catch up with the relevant meaning, so language users need linguistic devices to absorb huge meanings at once. Sentence-long MWEs can be an essential linguistic device (Church, 2013;Shudo, Kurahone, & Tanabe, 2011;Tanabe, Takahashi, & Shudo, 2014), while phrase-long MWEs are not enough to fully convey meanings which language users express. ...

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing
  • Citing Article
  • January 2013

Computer Speech & Language

... In corpus linguistics, collocation is used to describe a sequence of words that often occur together and can be extracted from a corpus. When applied to big data, collocation extraction involves finding interesting word combinations in large corpora [8,9]. The concept of collocation has also been applied to management of resources such as power, computing resources in a data center, placement of satellites and measuring instruments [10]. ...

Large scale collocation data and their application to Japanese word processor technology
  • Citing Article
  • January 1998

... With the former, phrases that can be interpreted as idioms are found in text corpora, typically for lexicographers to compile idiom dictionaries. Previous studies have mostly focused on the idiom type identification (Lin, 1999;Baldwin et al., 2003;Shudo et al., 2004). However, there has been a growing interest in idiom token identification recently (Katz and Giesbrecht, 2006;Hashimoto et al., 2006;Cook et al., 2007). ...

MWEs as non-propositional content indicators
  • Citing Article
  • January 2004

... Multi-word adverbial expressions have received a great deal of interest in the field of linguistic studies and have been the object of previous studies in various languages (Gross, 1996a;Di Gioia, 2001;Català, 2003;Laporte et al., 2008;Palma, 2009;Shudo et al., 2011;Català et al., 2020;Müller et al., 2022Müller et al., , 2023. ...

A Comprehensive Dictionary of Multiword Expressions.
  • Citing Conference Paper
  • January 2011

... For example, the functional expressions would like to as in the predicate "would like to buy" and can't as in "can't install" are key expressions in detecting the customer's needs and complaints, providing valuable information to marketing research applications, consumer opinion analysis etc. Although these functional expressions are important, there have been very few studies that extensively deal with these functional expressions for use in natural language processing (NLP) systems (e.g., Tanabe et al., 2001;Sato, 2006, 2008). This is due to the fact that functional expressions are syntactically complicated and semantically abstract and so are poorly handled by NLP systems. ...

Modality Expressions in Japanese and Their Automatic Paraphrasing.
  • Citing Conference Paper
  • January 2001