About
431
Publications
121,032
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,449
Citations
Introduction
Additional affiliations
October 1998 - present
Publications
Publications (431)
In this study, we collected minutes from national and local assemblies published on the web, constructing a large corpus. In addition, we developed pre-trained language models adapted to the Japanese political domain using the constructed corpus of meeting records, incorporating several derivatives. Our models demonstrated superior and comparable p...
Propaganda in the digital era is often associated with online news. In this study, we focused on the use of large language models and their detection of propaganda techniques in the electronic press to investigate whether it is a noteworthy replacement for human annotators. We prepared prompts for generative pre-trained transformer models to find s...
For both humans and machines to acquire vocabulary, it is effective to learn words from context while using dictionaries as an auxiliary tool. It has been shown in previous linguistic studies that for humans, glossing either target words to be learned or words comprising context is an effective approach. For machines, however, previous NLP studies...
Moral AI has been studied in the fields of philosophy and artificial intelligence. Although most existing studies are only theoretical, recent developments in AI have made it increasingly necessary to implement AI with morality. On the other hand, humans are under the moral uncertainty of not knowing what is morally right. In this paper, we impleme...
The popularity of social media services has led to an increase of personality-relevant data in online spaces. While the majority of people who use these services tend to express their personality through measures offered by the Myers–Briggs Type Indicator (MBTI), another personality model known as the Big Five has been a dominant paradigm in academ...
Many Japanese onomatopoeic words have multiple meanings, which are determined by the surrounding context. Previous studies have proposed automated word sense classification using vector representations of onomatopoeia obtained from pre-trained BERT model. Although this method relatively high level of performance, the annotation cost for creating tr...
There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that...
Automatic MT metrics using word embeddings are extremely effective. Semantic word similarities are obtained using word embeddings. However, similarities using only static word embeddings are insufficient for lack of contextual information. Automatic metrics using fine-tuned models can adapt to a specific domain using contextual representations obta...
Warning: This paper contains examples of offensive language, including insulting or objectifying expressions.
Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. However, until recently no researc...
Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. If the social biases in NLP models can be indirectly harmful to humans involved, then the models can also indirectly harm nonhuman animals. Howe...
One of the burning problems lately in Japan has been cyber-bullying, or slandering and bullying people online. The problem has been especially noticed on unofficial Web sites of Japanese schools. Volunteers consisting of school personnel and PTA (Parent-Teacher Association) members have started Online Patrol to spot malicious contents within Web fo...
Recent years have brought an unprecedented and rapid development in the field of Natural Language Processing. To a large degree this is due to the emergence of modern language models like GPT-3 (Generative Pre-trained Transformer 3), XLNet, and BERT (Bidirectional Encoder Representations from Transformers), which are pre-trained on a large amount o...
Speaker identification presents a challenging task in the field of natural language processing and is believed to be an important step in building believable human-like systems. Although most of the existing work focuses on utilizing acoustic features for the task, in this paper we propose a text-dependent transformer-based machine learning approac...
本稿では,医療ニュース記事の見出しから,引用されている論文のアブストラクト候補をリランキングするタスクに取り組んだ.そして,Poly-encoder の改良手法 (Poly-Encoder(First-all))と,論文アブストラクトの特徴 を生かした手法(Paper-TA-Encoder)を提案した.その 結果,既存手法である poly-encoder に比べ MRR スコア が 0.08 ポイント向上し,embedding の圧縮を行う際に情 報損失が起こっていることがわかった.また,論文タイ トルには論文の主張が反映されていることに焦点を当て たモデルである Paper-TA-Encoder に対しては,既存手法 である cross-encoder の MRR スコアを 0.02 ポイ...
近年,オンライン上の医療記事を用いて医療情報を収集するケースが増えている.その際,閲覧している記事 の信頼性を判断するのは困難である.客観的に信頼性の 評価を行うための評価指標の一つとして,記事の主張の 根拠となる論文の有無を探索する方法がある.そこで本 研究では,記事見出しに対して,関連する論文の候補を 抽出するタスクに取り組んだ.抽出に用いたのはトーク ンベース手法である BM25 の改善手法で,単語ごとに固 有のパラメータを持つBM25T[1]である.その結果,500 個の論文アブストラクト候補を抽出した際に,既存手法 である BM25 に比べ再現率の向上が見られた.
In this study, we focus on ethical education as a means to improve artificial companion’s conceptualization of moral decision-making process in human users. In particular, we focus on automatically determining whether changes in ethical education influenced core moral values in humans throughout the century. We analyze ethics as taught in Japan bef...
Emoticons are popularly used to express user’s feelings in social media, blogs, and instant messaging. However, the number of emoticons existing in emoticon dictionaries which users select from is large, thus, it is difficult for users to find the desired emoticon that matches the content of their messages. In this paper, we propose a method that s...
When people debate, they want to familiarize themselves with a whole range of arguments about a given topic in order to deepen their knowledge and inspire new claims. However, the amount of differently phrased arguments is humongous, making the process of processing them time-consuming. In spite of many works on using arguments (e.g. counter-argume...
It is known that word embeddings exhibit biases inherited from the corpus, and those biases reflect social stereotypes. Recently, many studies have been conducted to analyze and mitigate biases in word embeddings. Unsupervised Bias Enumeration (UBE) (Swinger et al., 2019) is one of approach to analyze biases for English, and Hard Debias (Bolukbasi...
In this paper we introduce HEMOS (Humor-EMOji-Slang-based) system for fine-grained sentiment classification for the Chinese language using deep learning approach. We investigate the importance of recognizing the influence of humor, pictograms and slang on the task of affective processing of the social media. In the first step, we collected 576 freq...
There are many discussions held during political meetings, and a large number of utterances for various topics is included in their transcripts. We need to read all of them if we want to follow speakers\' intentions or opinions about a given topic. To avoid such a costly and time-consuming process to grasp often longish discussions, NLP researchers...
In this work we propose an approach towards identifying expressions used figuratively in Japanese literary texts by means of the classification algorithm. Our considerations are inspired mostly by the epoch-making Conceptual Metaphor Theory which once presented by Lakoff and Johnson [13] has instantly become a central problem to address not only fr...
The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was ori...
In this paper we introduce computational model for recognizing figurative expressions in Japanese language. As a part of the training data we use the set of almost 26,000 Japanese sentences comprising both similes and metaphors. These were collected manually from literary texts and hence constitute trustworthy and probably the largest existing reso...
Lack of background knowledge about the everyday world is an obstacle on the way to simulate usual situations and their changes. In this paper we present a simple idea for extending common sense knowledge bases for Japanese language by using a language model. We investigate several semantic categories for which specific knowledge is collected with m...
In this paper, we introduce our trials to extend our previous work on lifelong learning cognitive architecture Bacterium Lingualis, which collects world knowledge from textual resources using linguistic clues. We utilize mask prediction functionality of the BERT language model to augment simple concepts with additional knowledge like means, goals o...
We strive to develop a dialog system which have a sense of humor in order to improve user satisfaction. We have constructed a large Japanese pun database. According to the analysis of the puns, onomatopoeia words assume an important role. In this paper, we report the analysis of onomatopoeia words in Japanese puns.
There is little research into designing artificial motivational agents. The end-goal of our studies is therefore to create a dialogue system that would motivate users to do their everyday tasks using natural language. In this paper, we present a method of distinguishing texts containing motivational advice from regular texts to sort out noise in tr...
This paper is an attempt at analyzing how much religious vocabulary (in this case Buddhist vocabulary taken from a large scale dictionary of Buddhist terms available online) is present in everyday Japanese social space (in this case in a repository of blog entries form the Ameba blog service) and thus in the consciousness of people. We also investi...
This paper demonstrates a reverse dictionary that can return relevant idiomatic expressions based on queries (input descriptions) given by users. The implementation of the system can be achieved with a Vector Space Model (VSM). However, when it comes to VSM, although its performance on queries is high under the condition that pairs of common keywor...
A tool that enables the use of active learning , as well as the incorporation of word embeddings, was evaluated for its ability to decrease the training data set size required for a named entity recognition model. Uncertainty-based active learning and the use of word embeddings led to very large performance improvements on small data sets for the e...
We explored adaptions required for applying the topic modelling tool Topics2Themes to a language that is very different from the one for which the tool was originally developed. Topics2Themes, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used...
In this paper we introduce computational model for recognizing figurative expressions in Japanese language. As a part of the training data we use the set of almost 26,000 Japanese sentences comprising both similes and metaphors. These were collected manually from literary texts and hence constitute trustworthy and probably the largest existing reso...
Debates play an important educational role and proper argumentation has a power to change peo-ple's stance on a given topic. Existing NLP research on persuasiveness of argumentation calculates it for single arguments or ranks arguments by their level of conviction. Our work extends this research by considering counterarguments. They can weaken or s...
In this paper, we propose a method for automatic implicit knowledge discovery for events. In recent approaches researchers calculate the relevance between two events appearing in the same document from news articles using distributed representations for machine learning. Since such methods handle only explicitly written information, it is very diff...
In this paper, we present research into advisory texts which eventually will be used to create a dialogue system providing motivational support to the user. We studied advisory comments from an online platform Reddit, including those containing motiva-tional advice. Utilizing advice features identified in previous studies, we were able to correctly...
Nowadays, social media has become the essential part of our lives. Pictograms (emoti-cons/emojis) have been widely used in social media as a medium for visually expressing emotions. In this paper, we propose a emoji-aware attention-based GRU network model for sentiment analysis of Weibo which is the most popular Chinese social media platform. First...
In this work we propose an approach towards identifying expressions used figuratively in Japanese literary texts by means of the classification algorithm. Our considerations are inspired mostly by the epoch-making Conceptual Metaphor Theory which once presented by Lakoff & Johnson (1980) has instantly become a central problem to address not only fr...
The STARS team participated in the Classification task of Question Answering Lab for Political Information (QA Lab-PoliInfo) subtask of the NTCIR-14. This report describes our methods for solving the task and discusses the results. We identify whether the policy and remarks are relevant or not, whether they contain a verifiable fact or not, and pre...
Nowadays, social media have become the essential part of our lives. Internet slang is an informal language used in everyday online communication which quickly becomes adopted or discarded by new generations. Similarly, pictograms (emoticons/emojis) have been widely used in social media as a mean for graphical expression of emotions. People can conv...
We describe performance evaluation of a method for recognizing utterances in local assembly minutes. The experimental datasets were collected from local assembly minutes of four municipalities for 4 years from April 2011 to March, 2013. The four municipalities are Tokyo, Aomori, Osaka and Fukuoka. We manually annotated each sentence whether the sen...
言語理解では,文章で書かれていない知識を背景 (常 識) 知識を用いて補完する.例えば,「風呂に入る」と いう行動には,「服を脱ぐ」や「体を洗う」といったサ ブイベントが含まれる.このような,イベントの理解 は事前知識がないと困難であるため,本稿ではイベン トに関する知識 (イベント知識) を構築する. イベント知識に関して Schank ら [1] はスクリプト の概念構造を提案している.スクリプトとは,イベン トと参加者の関係,イベント間の因果関係を構造化し て扱う知識表現である.これは,特定のシナリオ間で イベントがどのように展開されるかを推測するために 使用できる.また,参加者の関係をモデル化するため, 共参照解析や談話解析のようなタスクに適用すること ができる.しかしながら,Sch...
In our paper we discuss the problem of tacit knowledge which probably is one of the biggest obstacles on the way to human-level language understanding. While the latest massive transformer-based NLP algorithms show the potential to generate natural text, translate and answer questions, these achievements are still insufficient to directly help mach...
We carried out an experiment to examine emotional effect associated with images displayed in small field of view (FOV) telescope like virtual screen environment with VR-HMDs. We manipulate FOV of a virtual camera in VR to change the view of contents. Decreasing FOV makes view of angle narrow, and therefore, the view in VR looks like telescope, in w...
Internet slang is an informal language used in everyday online communication which quickly becomes adopted or discarded by new generations. Similarly, pictograms (emoticons/emojis) have been widely used in social media as a mean for graphical expression of emotions. People can convey delicate nuances through textual information when supported with...
Cyberbullying, or humiliating people using the Internet, has existed almost since the beginning of Internet communication. The relatively recent introduction of smartphones and tablet computers has caused cyberbullying to evolve into a serious social problem. In Japan, members of a parent-teacher association (PTA) attempted to address the problem b...
Recently we have proposed highly efficient method for recognizing Japanese sentences containing metaphors by utilizing figurative language examples from a dictionary. Having proven high efficiency of the proposed method when trained on distinctly metaphorical and non-metaphorical data, we proceeded to test it against text data containing a mix of f...
人工知能分野での常識的知識獲得は重要な問題の一つとして取り組まれている. このような知識を収集する オントロジー研究として ConceptNet があり多言語に対応しているが, 英語と非英語間の知識量の差は大きい. よっ て, 本研究では英語版の知識を日本語に対応させることで知識を拡充することを目的とする. 我々が行なった研究 として, ブログコーパスと対象の知識の関係性を表す手がかり語を用いて日本語の常識的知識の一般性を自動評価 する研究がある. 本稿では, 英日辞書と常識的知識の自動評価手法を用いて日本語の常識的知識を自動獲得する手 法を提案する. 本手法に基づく実験システムを作成し, 獲得した知識を対象とした評価実験及び考察において, 提 案手法の有効性を示す.
In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy assertions (informing of the creator of an object) and MemberOf assertions (informing of group membership) automatically...
In this paper, we introduce results of a classification experiment designed to recognize sentences containing metaphors as the first step of recognizing figurative expressions in Japanese text. For the experiments we have utilized existing set of figurative expressions and constructed one which should consist of mostly literal sentences. The former...
テキストを中心としたコミュニケーションにおいて,気持ちを表現するために顔文字が用いられている.顔文 字の挿入方法として,顔文字辞書からユーザが挿入した い顔文字を見つける方法が主流となってきている.しか し,顔文字辞書に収録される顔文字は,約 60,000 種類と 数が多く,顔文字を容易に選択することは困難である. このような背景から,ユーザの顔文字選択を支援する顔 文字推薦システムの研究が行われている.
我々は文献で,文中の単語が表す感情の種類と同 種の感情を表す顔文字を推薦の観点に含めた手法を提案 した.しかし,文に対して挿入される顔文字は必ずしも 文中の単語が表す感情の種類と同種のものではない. 本稿では,そのような場合でも効果的に顔文字を推薦す る方法として,入力文に対して過去に選択...
Emoticons have been widely used in social media as a language for graphical expression. People can express more delicate nuances through textual information by using emoticons, and the effectiveness of computer-mediated communication have been improved. In this paper, we propose an emoticon polarity- aware recurrent neural network method for sentim...
In this paper, we present studies on human-like motivational strategies which eventually will allow us to implement motivational support in our general dialogue system. We conducted a study on user comments from a discussion platform Reddit and identified text features that make a comment motivating. We achieved around 0.88 accuracy on classifying...
The problem of humiliating and slandering people through Internet, generally defined as cyberbullying (later: CB), has been recently noticed as a serious social problem disturbing mental health of Internet users. In Japan, to deal with the problem, members of Parent-Teacher Association (PTA) perform Internet Patrol – a voluntary work by reading thr...
One of the essential parts of second language curriculum is teaching vocabulary. Until now many existing techniques tried to facilitate word acquisition, but one method which has been paid less attention to is code-switching. In this paper, we present an experimental system for computer assisted vocabulary learning in context using a code-switching...
In this paper, we introduce our preliminary analysis of emoticons used on Weibo, Chinese microblog. By performing a polarity annotation with a new "humorous type" added, we have confirmed that 23 emoticons can be considered more as humorous than positive or negative. We also discussed some possible related problems which might occur during any soci...
In this paper we present our initial trials with extending knowledge for moral decision capability for artificial agents. We briefly present our approach to machine ethics and discuss possibility of acquiring universal ethical rules to be used by AGI. To test the idea we started extending our system which worked only with Japanese to other language...
A sampling survey of typology and component ratio analysis in Japanese puns revealed that the type of Japanese pun that had the largest proportion was a pun type with two sound sequences, whose consonants are phonetically close to each other in the same sentence which includes the pun. Based on this finding, we constructed rules to detect pairs of...
This paper summarizes several lexical methods for more comprehensive affect recognition in text using an example of typed utterances. We introduce a set of algorithms that are capable of recognizing emotions of user's statements in order to achieve more effective and smoother human-machine conversation. Aspects often neglected by existing systems w...
In this paper we underline the importance of knowledge in artificial moral agents and describe our experience-focused approach which could help existing algorithms go beyond proofs of concept level and be tested for generality and real-world usability. We point out the difficulties with implementation of current methods and their lack of contextual...
This paper presents a novel method of analyzing morphosemantic patterns in language to the detect cyberbullying, or frequently appearing harmful messages and entries that aim to humiliate other users. The morphosemantic patterns represent a novel concept, with the assumption that analyzed elements can be perceived as a combination of morphological...
This paper presents preliminary research aimed at gaining understanding about user preferences concerning electronic assistants. Our end goal is a system that not only plans the work, but also provides individual motivation for the user, which is a new approach. At this stage, we created a simple dialogue system Asystent to gather some basic data r...
This paper presents utterance generation methods for artificial foreign language tutors and discusses some problems of more autonomous educational tools. To tackle problem of keeping learners interested , we propose a hybrid, half automatic (for semantics), half rule-based (for syntax) approach that utilizes topic expansion by retrieving the conver...
In this paper we introduce our idea how non-restricted text-based knowledge bases can be used for simulating human evaluators. To illustrate this idea we introduce an example in which independent Internet resources help to automatically evaluate a given act by recognizing polarity of its possible outcomes described in various natural language corpo...
In this paper we introduce our virtual reality game prototype meant for acquiring world knowledge. We present examples of Games With A Purpose (GWAPs) representing different types of media and propose our own by describing its development and initial tests. After describing problems and possible solutions for the project, we discuss how sophisticat...
Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the re...
It can be said that none of yet proposed methods for achieving artificial ethical reasoning is realistic, i.e. working outside very limited environments and scenarios. Whichever method one chooses, it will not work in various real world situations because it would be very cost-inefficient to provide ethical knowledge for every possible situation. W...
We present ML-Ask – the first Open Source Affect Analysis system for textual input in Japanese. ML-Ask analyses the contents of an input (e.g., a sentence) and annotates it with information regarding the contained general emotive expressions, specific emotional words, valence-activation dimensions of overall expressed affect, and particular emotion...
In this paper, we present our progress so far in realization of project aimed to create a complex, modular humor-equipped conversational system. By complex, we mean that it should be able to: (1) detect users’ emotions, (2) detect users’ humorous behaviors and react to them properly, (3) generate humor according to users’ emotive states and (4) lea...
In this paper presents our research in automatic detection of emotionally loaded, or emotive sentences. We define the problem from a linguistic point of view assuming that emotive sentences stand out both lexically and grammatically. To verify this assumption we prepare a text classification experiment. In the experiment we apply language combinato...
In recent years, there have been growing needs for computers which comprehend what is meant in humorous texts. However, we have few examples of research that have tried to detect puns from a large corpora of spoken language. A sampling survey of typology and component ratio analysis in Japanese puns revealed that the type of Japanese pun that had t...
Background
Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English an...
We develop a supporting solution for “cyberbullying” prevention based on recent discoveries in Artificial Intelligence and Natural Language Processing. Cyberbullying, defined as using Internet to humiliate and slander other people has become a burning problem. In Japan members of Parent-Teacher Association perform manual Web monitoring to stop cybe...
In this paper we propose a method of automatic distinction between two types of formally identical expressions in Japanese: similes and “metonymical comparisosn”, i.e. literal comparisons that include metonymic relations between elements. Expression like “kujira no you na chiisai me” can be translated into English as “eyes small as whale’s”, while...
In this paper we introduce a simple text mining method which could be helpful for automatic image understanding process, especially in object recognition. By using colors as an example of vision-related feature category, we describe how word frequencies, dependency parsing and quasi-semantic filtering help to acquire more accurate knowledge which u...
In this paper we introduce our novel method for utilizing web mining and semantic categories for determining automatically if a given act is worth praising or not. We report how existing lexicons used in affective analysis and ethical judgement can be combined for generating useful queries for knowledge retrieval from a 5.5 billion word blog corpus...
In this paper we present a method for extract- ing IsA assertions (hyponymy relations), AtLoca- tion assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy asser- tions (informing of the creator of an object) and MemberOf assertions (informing of group mem- bership) autom...
In this paper we introduce an algorithm that is ca- pable of recognizing emotions of user’s statements in order to achieve more effective and smoother human-machine conversation. Many studies of the emotion recognition have been actively conducted in order to quantify affect, but it is rather difficult to recognize it from more complicated sentence...
This paper presents a Cockney rhyming slang recognizing and converting modules of a cyberbullying detection system. Firstly, we introduce the concept of rhyming slang, analyze its phrasal constructions and discuss the usefulness of features of the rhyming slang, such as resemblance to code-mixing. Secondly, we describe the corpus and phrasal rhymin...
Vocabulary plays an important part in second language learning and there are many existing techniques to facilitate word acquisition. One of these methods is code-switching, or mixing the vocabulary of two languages in one sentence. In this paper the authors propose an experimental system for computer-assisted English vocabulary learning in context...