About
317
Publications
101,509
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,748
Citations
Introduction
I am working on automatic text understanding algorithms to allow machines analyze average human behavior. My main research topics are Common Sense Knowledge, Affect Processing and Machine Ethics, however I am also involved in Artificial Humor, Metaphor Understanding and Generation, Cyber-bullying Detection and many more.
Publications
Publications (317)
In this paper we will introduce our approach to the ethical issue of machine intelligence which we developed during our experiments with automatic common sense retrieval and affective computing for open-domain talking systems. As we are preparing for applying our ideas for the real-world applications as housework robots, we have to assure safety of...
In this paper we propose a method for generating simple but semantically
correct replies to user inputs which are not related to a given task of
a task-oriented information kiosk or any other natural language
interface placed in a public place. We describe our method for
retrieving meaningful associations from the Web and adding modality
based on c...
An intelligent system of the future should make its user feel comfortable, which is impossible without understanding context they coexist in. However, our past research did not treat language information as a part of the context a robot works in, and data about reasons why the user had made his decisions was not obtained. Therefore, we decided to u...
Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speci...
Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speci...
The rapid advancement of artificial intelligence (AI) and natural language processing (NLP) has profoundly impacted our understanding of emotions, decision-making, and opinions, particularly within the context of the Internet and social media [...]
Wide adoption of social media has caused an explosion of information stored online, with the majority of that information containing subjective, opinionated, and emotional content produced daily by users. The field of emotion analysis has helped effectively process such human emotional expressions expressed in daily social media posts. Unfortunatel...
Propaganda in the digital era is often associated with online news. In this study, we focused on the use of large language models and their detection of propaganda techniques in the electronic press to investigate whether it is a noteworthy replacement for human annotators. We prepared prompts for generative pre-trained transformer models to find s...
For both humans and machines to acquire vocabulary, it is effective to learn words from context while using dictionaries as an auxiliary tool. It has been shown in previous linguistic studies that for humans, glossing either target words to be learned or words comprising context is an effective approach. For machines, however, previous NLP studies...
Moral AI has been studied in the fields of philosophy and artificial intelligence. Although most existing studies are only theoretical, recent developments in AI have made it increasingly necessary to implement AI with morality. On the other hand, humans are under the moral uncertainty of not knowing what is morally right. In this paper, we impleme...
BACKGROUND
Neurodegenerative and mental disorders significantly affect the manner of speaking, syntax, semantics and specific habits of word choice. Linguistic analysis can detect these disorders.
OBJECTIVE
The aim of this study was to examine whether speech analysis can be useful for screening test in neurology and psychiatry, due to the limited...
The popularity of social media services has led to an increase of personality-relevant data in online spaces. While the majority of people who use these services tend to express their personality through measures offered by the Myers–Briggs Type Indicator (MBTI), another personality model known as the Big Five has been a dominant paradigm in academ...
The goal of this article is to present and compare recent approaches which use speech and voice analysis as biomarkers for screening tests and monitoring of some diseases. The article takes into account metabolic, respiratory, cardiovascular, endocrine, and nervous system disorders. A selection of articles was performed to identify studies that ass...
There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that...
In this paper we describe an extension of question answering dataset to be utilized for testing security export control expert system. Differently to the long questions and answers with rich context mostly describing exceptional interpretation of regulatory texts, our addition contains short questions and answers which were manually created by an e...
Warning: This paper contains examples of offensive language, including insulting or objectifying expressions.
Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. However, until recently no researc...
Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. If the social biases in NLP models can be indirectly harmful to humans involved, then the models can also indirectly harm nonhuman animals. Howe...
One of the burning problems lately in Japan has been cyber-bullying, or slandering and bullying people online. The problem has been especially noticed on unofficial Web sites of Japanese schools. Volunteers consisting of school personnel and PTA (Parent-Teacher Association) members have started Online Patrol to spot malicious contents within Web fo...
Recent years have brought an unprecedented and rapid development in the field of Natural Language Processing. To a large degree this is due to the emergence of modern language models like GPT-3 (Generative Pre-trained Transformer 3), XLNet, and BERT (Bidirectional Encoder Representations from Transformers), which are pre-trained on a large amount o...
Speaker identification presents a challenging task in the field of natural language processing and is believed to be an important step in building believable human-like systems. Although most of the existing work focuses on utilizing acoustic features for the task, in this paper we propose a text-dependent transformer-based machine learning approac...
本稿では,医療ニュース記事の見出しから,引用されている論文のアブストラクト候補をリランキングするタスクに取り組んだ.そして,Poly-encoder の改良手法 (Poly-Encoder(First-all))と,論文アブストラクトの特徴 を生かした手法(Paper-TA-Encoder)を提案した.その 結果,既存手法である poly-encoder に比べ MRR スコア が 0.08 ポイント向上し,embedding の圧縮を行う際に情 報損失が起こっていることがわかった.また,論文タイ トルには論文の主張が反映されていることに焦点を当て たモデルである Paper-TA-Encoder に対しては,既存手法 である cross-encoder の MRR スコアを 0.02 ポイ...
近年,オンライン上の医療記事を用いて医療情報を収集するケースが増えている.その際,閲覧している記事 の信頼性を判断するのは困難である.客観的に信頼性の 評価を行うための評価指標の一つとして,記事の主張の 根拠となる論文の有無を探索する方法がある.そこで本 研究では,記事見出しに対して,関連する論文の候補を 抽出するタスクに取り組んだ.抽出に用いたのはトーク ンベース手法である BM25 の改善手法で,単語ごとに固 有のパラメータを持つBM25T[1]である.その結果,500 個の論文アブストラクト候補を抽出した際に,既存手法 である BM25 に比べ再現率の向上が見られた.
White paper describing various visions of applications of NLP and voice technologies in entertainment in 3 years from now.
In this study, we focus on ethical education as a means to improve artificial companion’s conceptualization of moral decision-making process in human users. In particular, we focus on automatically determining whether changes in ethical education influenced core moral values in humans throughout the century. We analyze ethics as taught in Japan bef...
Emoticons are popularly used to express user’s feelings in social media, blogs, and instant messaging. However, the number of emoticons existing in emoticon dictionaries which users select from is large, thus, it is difficult for users to find the desired emoticon that matches the content of their messages. In this paper, we propose a method that s...
When people debate, they want to familiarize themselves with a whole range of arguments about a given topic in order to deepen their knowledge and inspire new claims. However, the amount of differently phrased arguments is humongous, making the process of processing them time-consuming. In spite of many works on using arguments (e.g. counter-argume...
In this paper, we report our initial findings from creating a question answering module for a dialog-based expert system which aims at advising users on export control regulations. We describe problems of data scarcity and knowledge transfer showing results of preliminary trials with utilizing contextual embeddings to extend keyword-based matching...
It is known that word embeddings exhibit biases inherited from the corpus, and those biases reflect social stereotypes. Recently, many studies have been conducted to analyze and mitigate biases in word embeddings. Unsupervised Bias Enumeration (UBE) (Swinger et al., 2019) is one of approach to analyze biases for English, and Hard Debias (Bolukbasi...
In this paper we introduce HEMOS (Humor-EMOji-Slang-based) system for fine-grained sentiment classification for the Chinese language using deep learning approach. We investigate the importance of recognizing the influence of humor, pictograms and slang on the task of affective processing of the social media. In the first step, we collected 576 freq...
There are many discussions held during political meetings, and a large number of utterances for various topics is included in their transcripts. We need to read all of them if we want to follow speakers\' intentions or opinions about a given topic. To avoid such a costly and time-consuming process to grasp often longish discussions, NLP researchers...
In this work we propose an approach towards identifying expressions used figuratively in Japanese literary texts by means of the classification algorithm. Our considerations are inspired mostly by the epoch-making Conceptual Metaphor Theory which once presented by Lakoff and Johnson [13] has instantly become a central problem to address not only fr...
The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was ori...
In this paper we introduce computational model for recognizing figurative expressions in Japanese language. As a part of the training data we use the set of almost 26,000 Japanese sentences comprising both similes and metaphors. These were collected manually from literary texts and hence constitute trustworthy and probably the largest existing reso...
Lack of background knowledge about the everyday world is an obstacle on the way to simulate usual situations and their changes. In this paper we present a simple idea for extending common sense knowledge bases for Japanese language by using a language model. We investigate several semantic categories for which specific knowledge is collected with m...
In this paper, we introduce our trials to extend our previous work on lifelong learning cognitive architecture Bacterium Lingualis, which collects world knowledge from textual resources using linguistic clues. We utilize mask prediction functionality of the BERT language model to augment simple concepts with additional knowledge like means, goals o...
There is little research into designing artificial motivational agents. The end-goal of our studies is therefore to create a dialogue system that would motivate users to do their everyday tasks using natural language. In this paper, we present a method of distinguishing texts containing motivational advice from regular texts to sort out noise in tr...
This paper is an attempt at analyzing how much religious vocabulary (in this case Buddhist vocabulary taken from a large scale dictionary of Buddhist terms available online) is present in everyday Japanese social space (in this case in a repository of blog entries form the Ameba blog service) and thus in the consciousness of people. We also investi...
To be able to figure out how and what human think is a task many have tried to accomplish. In the past, instant thoughts are almost unmeasurable for there are no materials that can assist the studies of thoughts. In the digital, big data era, where social media become more common, ideas can easily be shared. Still, instant thought is hard to pinpoi...
This paper demonstrates a reverse dictionary that can return relevant idiomatic expressions based on queries (input descriptions) given by users. The implementation of the system can be achieved with a Vector Space Model (VSM). However, when it comes to VSM, although its performance on queries is high under the condition that pairs of common keywor...
A tool that enables the use of active learning , as well as the incorporation of word embeddings, was evaluated for its ability to decrease the training data set size required for a named entity recognition model. Uncertainty-based active learning and the use of word embeddings led to very large performance improvements on small data sets for the e...
We explored adaptions required for applying the topic modelling tool Topics2Themes to a language that is very different from the one for which the tool was originally developed. Topics2Themes, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used...
In this paper we introduce computational model for recognizing figurative expressions in Japanese language. As a part of the training data we use the set of almost 26,000 Japanese sentences comprising both similes and metaphors. These were collected manually from literary texts and hence constitute trustworthy and probably the largest existing reso...
In this paper we analyzed how much religious vocabulary, in particular Buddhist vocabulary taken from the largest online dictionary of Buddhist terms, is present in everyday social space of Japanese people, particularly, in Japanese blog entries appearing on a popular blog service (Ameba blogs). We interpreted the level of everyday usage of Buddhis...
Debates play an important educational role and proper argumentation has a power to change peo-ple's stance on a given topic. Existing NLP research on persuasiveness of argumentation calculates it for single arguments or ranks arguments by their level of conviction. Our work extends this research by considering counterarguments. They can weaken or s...
In this paper, we propose a method for automatic implicit knowledge discovery for events. In recent approaches researchers calculate the relevance between two events appearing in the same document from news articles using distributed representations for machine learning. Since such methods handle only explicitly written information, it is very diff...
In this paper, we present research into advisory texts which eventually will be used to create a dialogue system providing motivational support to the user. We studied advisory comments from an online platform Reddit, including those containing motiva-tional advice. Utilizing advice features identified in previous studies, we were able to correctly...
Nowadays, social media has become the essential part of our lives. Pictograms (emoti-cons/emojis) have been widely used in social media as a medium for visually expressing emotions. In this paper, we propose a emoji-aware attention-based GRU network model for sentiment analysis of Weibo which is the most popular Chinese social media platform. First...
In this paper we describe our preliminary system design aiming at creating an artificial agent to support security export control specialists and researches who are not sure how sensitive their work might be. We propose a dialog system which combines interaction-based search with legal text understanding techniques in order to establish the relatio...
In this work we propose an approach towards identifying expressions used figuratively in Japanese literary texts by means of the classification algorithm. Our considerations are inspired mostly by the epoch-making Conceptual Metaphor Theory which once presented by Lakoff & Johnson (1980) has instantly become a central problem to address not only fr...
The STARS team participated in the Classification task of Question Answering Lab for Political Information (QA Lab-PoliInfo) subtask of the NTCIR-14. This report describes our methods for solving the task and discusses the results. We identify whether the policy and remarks are relevant or not, whether they contain a verifiable fact or not, and pre...
Nowadays, social media have become the essential part of our lives. Internet slang is an informal language used in everyday online communication which quickly becomes adopted or discarded by new generations. Similarly, pictograms (emoticons/emojis) have been widely used in social media as a mean for graphical expression of emotions. People can conv...
言語理解では,文章で書かれていない知識を背景 (常 識) 知識を用いて補完する.例えば,「風呂に入る」と いう行動には,「服を脱ぐ」や「体を洗う」といったサ ブイベントが含まれる.このような,イベントの理解 は事前知識がないと困難であるため,本稿ではイベン トに関する知識 (イベント知識) を構築する. イベント知識に関して Schank ら [1] はスクリプト の概念構造を提案している.スクリプトとは,イベン トと参加者の関係,イベント間の因果関係を構造化し て扱う知識表現である.これは,特定のシナリオ間で イベントがどのように展開されるかを推測するために 使用できる.また,参加者の関係をモデル化するため, 共参照解析や談話解析のようなタスクに適用すること ができる.しかしながら,Sch...
In our paper we discuss the problem of tacit knowledge which probably is one of the biggest obstacles on the way to human-level language understanding. While the latest massive transformer-based NLP algorithms show the potential to generate natural text, translate and answer questions, these achievements are still insufficient to directly help mach...
Internet slang is an informal language used in everyday online communication which quickly becomes adopted or discarded by new generations. Similarly, pictograms (emoticons/emojis) have been widely used in social media as a mean for graphical expression of emotions. People can convey delicate nuances through textual information when supported with...
Cyberbullying, or humiliating people using the Internet, has existed almost since the beginning of Internet communication. The relatively recent introduction of smartphones and tablet computers has caused cyberbullying to evolve into a serious social problem. In Japan, members of a parent-teacher association (PTA) attempted to address the problem b...
Recently we have proposed highly efficient method for recognizing Japanese sentences containing metaphors by utilizing figurative language examples from a dictionary. Having proven high efficiency of the proposed method when trained on distinctly metaphorical and non-metaphorical data, we proceeded to test it against text data containing a mix of f...
人工知能分野での常識的知識獲得は重要な問題の一つとして取り組まれている. このような知識を収集する オントロジー研究として ConceptNet があり多言語に対応しているが, 英語と非英語間の知識量の差は大きい. よっ て, 本研究では英語版の知識を日本語に対応させることで知識を拡充することを目的とする. 我々が行なった研究 として, ブログコーパスと対象の知識の関係性を表す手がかり語を用いて日本語の常識的知識の一般性を自動評価 する研究がある. 本稿では, 英日辞書と常識的知識の自動評価手法を用いて日本語の常識的知識を自動獲得する手 法を提案する. 本手法に基づく実験システムを作成し, 獲得した知識を対象とした評価実験及び考察において, 提 案手法の有効性を示す.
In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations), CreatedBy assertions (informing of the creator of an object) and MemberOf assertions (informing of group membership) automatically...
In this paper, we introduce results of a classification experiment designed to recognize sentences containing metaphors as the first step of recognizing figurative expressions in Japanese text. For the experiments we have utilized existing set of figurative expressions and constructed one which should consist of mostly literal sentences. The former...
テキストを中心としたコミュニケーションにおいて,気持ちを表現するために顔文字が用いられている.顔文 字の挿入方法として,顔文字辞書からユーザが挿入した い顔文字を見つける方法が主流となってきている.しか し,顔文字辞書に収録される顔文字は,約 60,000 種類と 数が多く,顔文字を容易に選択することは困難である. このような背景から,ユーザの顔文字選択を支援する顔 文字推薦システムの研究が行われている.
我々は文献で,文中の単語が表す感情の種類と同 種の感情を表す顔文字を推薦の観点に含めた手法を提案 した.しかし,文に対して挿入される顔文字は必ずしも 文中の単語が表す感情の種類と同種のものではない. 本稿では,そのような場合でも効果的に顔文字を推薦す る方法として,入力文に対して過去に選択...
Emoticons have been widely used in social media as a language for graphical expression. People can express more delicate nuances through textual information by using emoticons, and the effectiveness of computer-mediated communication have been improved. In this paper, we propose an emoticon polarity- aware recurrent neural network method for sentim...
In this paper, we present studies on human-like motivational strategies which eventually will allow us to implement motivational support in our general dialogue system. We conducted a study on user comments from a discussion platform Reddit and identified text features that make a comment motivating. We achieved around 0.88 accuracy on classifying...
This book constitutes the proceedings of the 11th International Conference on Artificial General Intelligence, AGI 2018, held in Prague, Czech Republic, in August 2018.
The 19 regular papers and 10 poster papers presented in this book were carefully reviewed and selected from 52 submissions. The conference encourage interdisciplinary research base...
In this paper we explore the use of a conversational interface to query a decision support system providing information relating to a city surveillance setting. Specifically, we focus on how the use of a Controlled Natural Language (CNL) can provide a method for processing natural language queries whilst also tracking the context of the conversatio...
The problem of humiliating and slandering people through Internet, generally defined as cyberbullying (later: CB), has been recently noticed as a serious social problem disturbing mental health of Internet users. In Japan, to deal with the problem, members of Parent-Teacher Association (PTA) perform Internet Patrol – a voluntary work by reading thr...
One of the essential parts of second language curriculum is teaching vocabulary. Until now many existing techniques tried to facilitate word acquisition, but one method which has been paid less attention to is code-switching. In this paper, we present an experimental system for computer assisted vocabulary learning in context using a code-switching...
「米国大学はスパイ天国である」という米FBI長官発言が、本年2月にメディアを賑わしたことは関係者の記憶に新しいが、米国に限らず日本の大学でも機微技術の管理が喫緊の課題となっている。日本では経済産業省が主管庁となって、機微技術の管理をはじめとした安全保障輸出管理を推進している。安全保障輸出管理において、貨物の輸出については税関という関門があるので水際で防御することが可能となるが、技術の提供については教員が正しい手続きを経ずに技術情報を海外の研究者にメールに添付して送信してしまうと、場合によっては機微な技術が懸念ある主体に流布し、安全保障上の脅威となるおそれがある。同省では合体マトリクス等の一覧表を整備して啓蒙に努めているが、そもそも合体マトリクスの使い方を理解していることも必要であり、まだ十分...
In this paper, we introduce our preliminary analysis of emoticons used on Weibo, Chinese microblog. By performing a polarity annotation with a new "humorous type" added, we have confirmed that 23 emoticons can be considered more as humorous than positive or negative. We also discussed some possible related problems which might occur during any soci...
In this paper we present our initial trials with extending knowledge for moral decision capability for artificial agents. We briefly present our approach to machine ethics and discuss possibility of acquiring universal ethical rules to be used by AGI. To test the idea we started extending our system which worked only with Japanese to other language...
This paper summarizes several lexical methods for more comprehensive affect recognition in text using an example of typed utterances. We introduce a set of algorithms that are capable of recognizing emotions of user's statements in order to achieve more effective and smoother human-machine conversation. Aspects often neglected by existing systems w...
In this paper we underline the importance of knowledge in artificial moral agents and describe our experience-focused approach which could help existing algorithms go beyond proofs of concept level and be tested for generality and real-world usability. We point out the difficulties with implementation of current methods and their lack of contextual...
“How to use AI in Your Research - Update on Big Data and AI Trends”, Invited Talk at Hokkaido University Transferable Seminar, 27 February, 2018.
This paper presents a novel method of analyzing morphosemantic patterns in language to the detect cyberbullying, or frequently appearing harmful messages and entries that aim to humiliate other users. The morphosemantic patterns represent a novel concept, with the assumption that analyzed elements can be perceived as a combination of morphological...
Invited Talk at The Eighth International Workshop on Signal Design and Its Applications in Communications
This paper presents preliminary research aimed at gaining understanding about user preferences concerning electronic assistants. Our end goal is a system that not only plans the work, but also provides individual motivation for the user, which is a new approach. At this stage, we created a simple dialogue system Asystent to gather some basic data r...
This paper presents utterance generation methods for artificial foreign language tutors and discusses some problems of more autonomous educational tools. To tackle problem of keeping learners interested , we propose a hybrid, half automatic (for semantics), half rule-based (for syntax) approach that utilizes topic expansion by retrieving the conver...
In this paper we introduce our idea how non-restricted text-based knowledge bases can be used for simulating human evaluators. To illustrate this idea we introduce an example in which independent Internet resources help to automatically evaluate a given act by recognizing polarity of its possible outcomes described in various natural language corpo...
In this paper we introduce our virtual reality game prototype meant for acquiring world knowledge. We present examples of Games With A Purpose (GWAPs) representing different types of media and propose our own by describing its development and initial tests. After describing problems and possible solutions for the project, we discuss how sophisticat...
It can be said that none of yet proposed methods for achieving artificial ethical reasoning is realistic, i.e. working outside very limited environments and scenarios. Whichever method one chooses, it will not work in various real world situations because it would be very cost-inefficient to provide ethical knowledge for every possible situation. W...