Ryokan Ri

Ryokan Ri
The University of Tokyo | Todai · Department of Electrical Engineering and Information Systems

About

20
Publications
810
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
94
Citations
Citations since 2017
20 Research Items
94 Citations
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050

Publications

Publications (20)
Article
Full-text available
We conducted a study to determine what kind of structural knowledge learned in neural network encoders is transferable to the processing of natural language. We designed artificial languages with structural properties that mimic those of natural language, pretrained encoders on the data, and examined the encoders' effects on downstream tasks in nat...
Preprint
Full-text available
To develop computational agents that better communicate using their own emergent language, we endow the agents with an ability to focus their attention on particular concepts in the environment. Humans often understand an object or scene as a composite of concepts and those concepts are further mapped onto words. We implement this intuition as cros...
Preprint
We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities. The advantage of using entity supervision is twofold: (1) entities have been shown to be a strong indicator of text semantics and thus should provide rich training signals for sentence embeddings; (2) entities are...
Preprint
Full-text available
We investigate what kind of structural knowledge learned in neural network encoders is transferable to processing natural language. We design synthetic languages with structural properties that mimic natural language, pretrain encoders on the data, and see how much performance the encoder exhibits on downstream tasks in natural language. Our experi...
Preprint
Full-text available
Recent studies have shown that multilingual pretrained language models can be effectively improved with cross-lingual alignment information from Wikipedia entities. However, existing methods only exploit entity information in pretraining and do not explicitly use entities in downstream tasks. In this study, we explore the effectiveness of leveragin...
Preprint
Full-text available
Placeholder translation systems enable the users to specify how a specific phrase is translated in the output sentence. The system is trained to output special placeholder tokens, and the user-specified term is injected into the output through the context-free replacement of the placeholder token. However, this approach could result in ungrammatica...
Preprint
Full-text available
For Japanese-to-English translation, zero pronouns in Japanese pose a challenge, since the model needs to infer and produce the corresponding pronoun in the target side of the English sentence. However, although fully resolving zero pronouns often needs discourse context, in some cases, the local context within a sentence gives clues to the inferen...
Article
Full-text available
Most machine translation (MT) research has focused on sentences as translation units (sentence-level MT), and has achieved acceptable translation quality for sentences where cross-sentential context is not required in mainly high-resourced languages. Recently, many researchers have worked on MT models that can consider a cross-sentential context. T...
Preprint
Full-text available
Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-Eng...
Preprint
Full-text available
While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation qua...
Preprint
Unsupervised bilingual word embedding (BWE) methods learn a linear transformation matrix that maps two monolingual embedding spaces that are separately trained with monolingual corpora. This method assumes that the two embedding spaces are structurally similar, which does not necessarily hold true in general. In this paper, we propose using a pseud...
Preprint
Full-text available
Existing approaches to mapping-based cross-lingual word embeddings are based on the assumption that the source and target embedding spaces are structurally similar. The structures of embedding spaces largely depend on the co-occurrence statistics of each word, which the choice of context window determines. Despite this obvious connection between th...

Network

Cited By