Content uploaded by Hiromu Yakura
Author content
All content in this area was uploaded by Hiromu Yakura on May 01, 2022
Content may be subject to copyright.
VocabEncounter: NMT-powered Vocabulary Learning by
Presenting Computer-Generated Usages of Foreign Words into
Users’ Daily Lives
Riku Arakawa∗ Hiromu Yakura∗ Sosuke Kobayashi
Carnegie Mellon University University of Tsukuba / National Tohoku University
Pittsburgh, USA Institute of Advanced Industrial Sendai, Japan
rarakawa@andrew.cmu.edu Science and Technology (AIST) Preferred Networks
Tsukuba, Japan Tokyo, Japan
hiromu.yakura@aist.go.jp in2400@gmail.com
bogus (adj.)
虚偽の、いんちきの ……政府の給付金ですが、
its applicants were mostly
occupied by bogus companies.
これを受けて……
?
bogus (adj.)
虚偽の、いんちきの
NLP techniques
(NMT with constrained decoding, etc.)
Repeated exposure to
word usages is crucial in
vocabulary learning. VocabEncounter achieves it by
encapsulating foreign words in
materials the user is reading in native language.
Various daily activities can be transformed
into the eld of learning.
working
commuting
strolling
watching movies
Figure 1: VocabEncounter enables users to encounter foreign words by presenting their usages generated using NLP techniques
into materials the user is reading. Using this approach, the user can transform their daily life into the eld of vocabulary
learning.
ABSTRACT
We demonstrate that recent natural language processing (NLP) tech-
niques introduce a new paradigm of vocabulary learning that bene-
ts from both micro and usage-based learning by generating and
presenting the usages of foreign words based on the learner’s con-
text. Then, without allocating dedicated time for studying, the user
can become familiarized with how the words are used by seeing the
example usages during daily activities, such as Web browsing. To
achieve this, we introduce VocabEncounter, a vocabulary-learning
system that suitably encapsulates the given words into materials
the user is reading in near real time by leveraging recent NLP tech-
niques. After conrming the system’s human-comparable quality
of generating translated phrases by involving crowdworkers, we
conducted a series of user studies, which demonstrated its eec-
tiveness on learning vocabulary and its favorable experiences. Our
work shows how NLP-based generation techniques can transform
our daily activities into a eld for vocabulary learning.
∗These authors contributed equally and are ordered alphabetically.
This work is licensed under a Creative Commons Attribution International
4.0 License.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9157-3/22/04.
https://doi.org/10.1145/3491102.3501839
CCS CONCEPTS
• Human-centered computing → Natural language interfaces
;
• Computing methodologies → Machine translation
;
• Ap-
plied computing → Interactive learning environments.
KEYWORDS
natural language processing, neural mechanical translation, vocab-
ulary learning
ACM Reference Format:
Riku Arakawa, Hiromu Yakura, and Sosuke Kobayashi. 2022. VocabEn-
counter: NMT-powered Vocabulary Learning by Presenting Computer-
Generated Usages of Foreign Words into Users’ Daily Lives. In CHI Con-
ference on Human Factors in Computing Systems (CHI ’22), April 29-May 5,
2022, New Orleans, LA, USA. ACM, New York, NY, USA, 21 pages. https:
//doi.org/10.1145/3491102.3501839
1 INTRODUCTION
Acquiring vocabulary is of paramount importance in learning a
foreign language as it is fundamental knowledge to use and under-
stand expressions of the language [
1
,
4
]. Given that, a number of
educational systems have been introduced for vocabulary learn-
ing, reecting the expansion of Web and mobile technology, such
as online vocabulary games [
88
] and mobile apps for ashcards
[
32
,
59
]. Also, HCI researchers have leveraged various interaction
techniques for vocabulary learning, from app notications [
23
] to
multimedia retrieval [90] or mixed reality [85].
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Arakawa, Yakura, et al.
There are two main directions for supporting vocabulary learn-
ing: micro learning [
30
,
40
] and usage-based learning [
49
,
90
]. The
strategy of micro learning was introduced to solve the dilemma be-
tween the necessity of repeated word exposure to enhance memory
and the inconvenience of allocating dedicated time for vocabulary
learning, as Dingler et al. [
23
] implemented a mobile app enabling
instant learning. Some recent techniques present the words in a
manner that is grounded in a user’s context to learn them in small
time segments eectively; for example, Edge et al. [
26
] developed a
smartphone app that presents a short vocabulary quiz of relevance
to the user’s location. On the other hand, whereas the simplest
way of usage-based learning would be referring to example sen-
tences while learning with a vocabulary book, researchers have
investigated how computers can augment this learning strategy by
leveraging big data on the Web [
49
,
90
]. Specically, existing online
videos [
90
] or news articles [
49
] are used as a source of providing
practical usages of the words to learn.
Considering that the eectiveness of these techniques for achiev-
ing micro or usage-based learning has been conrmed, achieving
both simultaneously would be a promising way for further im-
proving the experience of vocabulary learning using computers.
However, how to combine both strategies eectively is not trivial;
simply presenting the usages of the words to learn within a user’s
daily life would not be optimal because the usages available in exist-
ing resources (e.g., videos or news articles) do not always match the
user’s context. Thus, it is hard to oer the user contextual clues to
learn the words so that the user can fully benet from the strategy
of micro learning.
Here, we suppose that the recent advance of natural language
processing (NLP) techniques opens up a way to address this point;
we can generate such usages by taking a user’s context into account
instead of retrieving them from existing resources. Then, the user
can be exposed to contextualized usages of the words they want
to remember in their daily lives without spending dedicated time
studying. An example scenario is as follows:
One night, Satoru, a Japanese student studying Eng-
lish words for a university exam, studied “bogus” in a
vocabulary book. The next day, he surfed the Web and
began reading some news articles in Japanese as usual.
While reading, he encountered the following sentence
that was partially translated into English: “
ロックダ
ウンの延長に伴う政府の給付金ですが
1、
its ap-
plicants were mostly occupied by bogus companies.
こ
れを受けて政府は
...
2
” Here, he tried to recall the
meaning of “bogus,” given the context of the news
article, and he eventually relearned its meaning with
the presented example.
In this scenario, Satoru could easily memorize the meaning of a new
English word he wanted to learn by encountering its usage while
casually reading Japanese content on the Web. In a nutshell, our key
idea is to encapsulate foreign words in materials that a user reads in
their native language to expose them to contextualized usages of the
2
Translation: “
ロックダウンの延長に伴う政府の給付金ですが
”
−→
“With
regard to the government benets due to the lockdown extension”
...”
2
Translation: “
これを受けて政府は
...”
−→
“In response to this, the government
words. While this assumes that the user has the least command of
foreign language so that they can understand the presented usage,
the user can learn the encapsulated words in their daily lives.
For this purpose, VocabEncounter is proposed by leveraging re-
cent NLP techniques to generate such usages of foreign words on
the spot. Specically, our system rst identies phrases in the orig-
inal materials that could be used to encapsulate one of the given
foreign words that the user wants to remember. Here, multilingual
word embedding [
47
] and dependency structure analysis [
89
] are
utilized in combination. By doing so, we can extract phrases likely to
preserve their original meaning and maintain syntactic naturalness
when used for encapsulation. Secondly, the targeted phrases are
translated into the foreign language so that the translated phrases
include one of the given foreign words. This translation process is
enabled by introducing a constrained decoding algorithm [
38
] into
a Transformer-based neural machine translation (NMT) model [
84
].
We also introduce a scoring algorithm based on Sentence-BERT [
70
]
to ensure that the translated phrases preserve the original meaning
and maintain syntactic naturalness. Lastly, as a proof-of-concept,
we develop an interface incorporating this encapsulation approach
for vocabulary learning in everyday life in the form of a Chrome
extension. The extension encapsulates the words to learn into Web
content the user reads in real time by presenting the translated
phrases along with a hoverbox for looking up word meanings if
the user does not recall them.
Towards deploying VocabEncounter for vocabulary learning,
there would be several challenges to overcome. For example, the
translation mechanism may present unnatural or mistranslated
phrases. Also, even if we can generate natural usages, experimen-
tal verication is required to show that encountering such usages
helps users memorize foreign words. Furthermore, since VocabEn-
counter is intended to enable users to learn vocabulary during their
daily lives without dedicating time, it should be tested whether the
experience of learning with VocabEncounter is favored.
We conducted a series of experiments to demonstrate that our
system suciently resolves these concerns. First, we examined the
quality of the phrases generated by VocabEncounter using a crowd-
sourcing service. The result conrmed not only the suciency of
the targeting mechanism but also the feasible quality of the trans-
lation mechanism. In particular, the translated phrases produced
by the proposed approach received an evaluation comparable to
those translated by a bilingual speaker under the same constraint
of encapsulating specied words. Next, we conducted a user study
where we asked participants to spend a day using VocabEncounter,
which is implemented as a Chrome extension, on their PCs. The re-
sult suggested that not only the experience of being presented with
foreign words to learn during Web browsing but also the design of
presenting them with their generated usages help the participants
memorize the words. Lastly, we conducted a one-week user study
where we provided VocabEncounter to learners and let them use
it freely. Through semi-structured interviews, we conrmed that
they favored the experience of learning with VocabEncounter, espe-
cially due to its design of achieving micro and usage-based learning
simultaneously. We want to emphasize that our proposed approach
can be employed in various situations, as illustrated in Figure 1.
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
These prototypes, along with the above results, exhibit that our pre-
sented paradigm involving NLP-based generation techniques can
transform our daily activities into a eld for vocabulary learning.
2 RELATED WORK
To situate our work, we start by reviewing existing interaction
techniques for vocabulary learning. As mentioned in Section 1,
there are two main strategies in supporting vocabulary learning:
micro and usage-based learning. In this section, we discuss how
previous HCI works have leveraged computers for each learning
strategy along with the pedagogical background behind them. We
then introduce related NLP techniques that enable us to achieve
the scenario envisioned in Section 1.
2.1 Interaction techniques for vocabulary
learning: micro learning
While the importance of vocabulary learning in mastering a for-
eign language has been acknowledged for a long time [
1
,
4
], most
learners have relied on simple educational resources like word
lists or vocabulary cards [
61
]. To help them learn vocabulary more
eectively, a large body of HCI research has been devoted to in-
troducing various interaction techniques. One major objective is
delivering micro learning [
30
,
40
,
45
] with the help of mobile or
Web technologies.
In micro learning, learners are encouraged to leverage small
learning units and short-term learning activities within their daily
lives. This learning strategy was introduced to address the diculty
of learning vocabulary in the busyness of everyday life [
30
,
40
].
While learners need to be constantly exposed to the words they
want to remember to overcome the forgetting curve [
67
], we often
fail in scheduling dedicated time for studying, which motivates
the exploration of practical ways to learn in small time segments.
This learning strategy can also be associated with the concept of
casual learning [
34
,
44
] in terms of its emphasis on leveraging
daily opportunities to decrease the mental burden of studying. By
incorporating this strategy, learners can reduce their cognitive load
while maintaining long-term retention [50].
A wide range of computing devices has been exploited to achieve
micro learning [
14
,
15
,
20
,
22
,
23
,
25
]. For example, Dingler et al.
[
22
] explored the use of pervasive physical displays for vocabulary
learning by placing displays throughout users’ homes and work
environments. Smartphones are often leveraged to pursue more
handy approaches, such as presenting quizzes through notication
[
23
] and scheduling tests repeatedly [
25
]. Some work also made
use of small time segments associated with smartphones, such as
using live wallpaper [
20
] and wait time in messaging and loading
[14, 15].
In addition to leveraging small time segments, some recent
techniques exploit the contextual information of learners because
grounding new words in learners’ context is known to be eective
for enhancing their memorization [
33
,
58
]. For example, RFID-based
activity recognition [
8
,
64
] and GPS-based location recognition
[
26
,
36
] were used to present foreign words or vocabulary quizzes
of relevance to a user’s context. Smart Web browsers for vocabu-
lary learning [
10
,
81
] have been proposed to capture the context
while using computers. Berleant et al. [
10
] proposed an approach of
translating words on a Web page into a foreign language to increase
users’ exposure to the words. The eectiveness of such a word-level
translation approach on the user’s memory performance has been
empirically conrmed by Trusty and Truong [
81
]. These studies
imply the importance of transforming our daily lives into a eld of
vocabulary learning. At the same time, as computers have increased
their presence in our lives, they also signify the potential of HCI
techniques for facilitating micro learning.
2.2 Interaction techniques for vocabulary
learning: usage-based learning
The other objective is supporting vocabulary learning via usage-
based learning. In fact, most conventional educational resources
present not only the formal denitions of words but also their
usages in the form of example sentences [
60
]. Example sentences
are oered because, without seeing usages, it is dicult to elaborate
the semantic information of words, which is crucial to the long-
term retention of the words [
13
]. In particular, it is experimentally
conrmed that even example phrases consisting of as few as ten
words were eective in acquiring vocabulary [6].
Given this context, a new paradigm of usage-based learning
has been recently introduced by leveraging big data on the Web
[
49
,
78
,
79
,
90
], allowing users to learn the practical usages of words
they want to remember by automatically retrieving them from the
Web. Syed and Collins-Thompson [
78
,
79
] have proposed ways to
tailor web search rankings to support vocabulary learning where
pages that contain words for a learner to remember are prioritized.
Lungu et al. [
49
] proposed a personalized system that recommends
Web articles that are likely to contain words unknown to each user.
Moreover, Vivo is a video-augmented dictionary that provides a
way to exploit huge online video resources for vocabulary learning
[
90
]. It appropriately identies short movie scenes that contain the
usages of the words a user wants to remember from existing movies
and provides the user with them as a contextual clue.
Although using these systems positively aects vocabulary learn-
ing, they still rely on the user’s motivation to watch or read the
recommended contents in a foreign language, which are not al-
ways attractive to every user. This point conversely suggests that
combining them with interaction techniques for supporting micro
learning [
30
,
40
] can deliver vocabulary learning more eectively.
However, how to achieve the combination is not trivial because,
even with big data, it is not always possible to get the usages of
the words that users want to remember in a way that is consistent
with their context. In fact, the smart Web browser implemented by
Trusty and Truong [
81
] limited the foreign words to be presented
while browsing to those included in their hand-crafted dictionary of
1,500 nouns. They rationalized it by stating “verbs and other parts
of speech are highly dependent on context [
81
],” which implies that
their browser of simply replacing a single word on each Web page
with its translation would not be optimal for presenting words in a
contextualized manner. This taught us that achieving both micro
and usage-based learning eectively requires us to overcome the
diculty of presenting the usages in a manner that is grounded in
the user’s context.
Here, we suppose that the recent advance of NLP techniques en-
ables us to overcome this diculty; i.e., we can generate the usages
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
of words a user wants to remember in a contextualized manner. By
encapsulating the words with their generated usages into materials
the user is exposed to in their daily lives, such as during Web brows-
ing, the user can learn them while casually reading the materials in
their native language, without taking dedicated time. This approach
is expected to be benecial given the nding by Brooks [
12
] that
learners positively perceive the experience of actively incorporat-
ing their rst language in foreign language acquisition, compared
to focusing only on the language to learn. Then, our main challenge
is how to achieve such encapsulation naturally while maximizing
the opportunities for users to encounter the words. Our approach
tackles this by leveraging recent NLP techniques, as described in
Section 2.3.
2.3 Related NLP techniques
To build VocanEncounter as an automated system, we employed
three NLP techniques: NMT with constrained decoding, multilin-
gual word embedding, and sentence embedding. In this section,
we would like to introduce them to provide background for our
approach.
Machine translation systems have been signicantly improved by
deep neural networks [
5
,
42
,
76
,
84
]. The output sentences are high
quality; in some domains, they can be comparable to professional
human translators, i.e., one cannot tell human translations from
system translations [
7
,
35
]. As well as the quality, controllability
is also studied, which leads to the development of constrained
decoding [
3
,
37
,
38
,
68
]. It enables us to generate translations with
target words specied dynamically at the time of translation, while
other methods often require retraining of the NMT models with the
required words [
24
,
75
]. In this study, we incorporated the approach
proposed by Hu et al. [
38
] with a good trade-o between accuracy
and speed for near real-time translation.
Word embedding has also become the de facto standard approach
for obtaining the numerical representation of words and eciently
calculating their semantic similarity [
9
,
16
,
19
,
53
]. Afterward, some
advanced work enables embedding words from multiple languages
into the joint semantic vector space, where similar words can be
found cross-lingually (e.g., car [En], wagen [De], voiture [Fr],
車
[Ja]) [
52
,
72
]. We can obtain such a semantic space even in unsu-
pervised manners, that is, without multilingual word dictionaries
[47].
The idea of this semantic embedding has also been extended
to the sentence level. While an empirical method of averaging
the embedding vector of all words in a sentence has been widely
used, dedicated methods for mapping variable-length sentences to
a semantic vector space have also evolved [
17
,
43
,
70
]. For example,
Sentence-BERT [
70
] leverages BERT [
21
], a pretrained Transformer-
based [
84
] model, to derive semantic embedding vectors that can
be compared using cosine similarity. In an analogous manner to
word embedding, its multilingual extension has also been proposed
[27, 71].
These techniques allow us to construct a novel vocabulary-
learning system that achieves the two learning strategies discussed
in Section 2.1 and Section 2.2. In particular, the human-level auto-
mated translation techniques enable a novel approach of supporting
vocabulary learning that dynamically generates the usages of the
Arakawa, Yakura, et al.
words to learn, rather than retrieving them from the Web. By incor-
porating such NMT techniques with semantic embedding vectors
that can be calculated in near real-time, the proposed system can
encapsulate the words a user wants to learn into materials the user
is reading by taking the user’s context into account.
3 PROPOSED APPROACH
In this section, we rst explain how VocabEncounter is designed to
achieve both micro and usage-based vocabulary learning taking a
user’s context into account. We then describe how VocabEncounter
encapsulates the words a user wants to learn through its two core
mechanisms: targeting and translating. We also explain how the
usages generated by VocabEncounter are presented in the user’s
daily activities by showing our user interface implemented as a
Chrome extension.
3.1 Overview
Our aim is to support vocabulary learning through VocabEncounter
by presenting the contextualized usage of the words that users want
to learn during their daily lives (Figure 1). This is enabled by lever-
aging recent NLP techniques to automatically generate translated
phrases containing specied foreign words in near real time. As
proof of concept, we considered utilizing the user’s everyday brows-
ing experience, inspired by previous works suggesting its ecacy
as micro learning [
10
,
81
]. In this case, the entire encapsulation
process is illustrated in Figure 2.
Looking back to the example scenario in Section 1, let us consider
a user who is a native Japanese speaker and studying English. We
note that our system can be used with other pairs of languages as
long as there is sucient data to train the models we used. As we
intend to oer a personalized learning experience, we assume a spe-
n o
En E n En E n
cic set of foreign words
W En = w , w , · · · ,w , · · · , w
1 2 jN
that the user wants to learn. Typically,
W En
can be given from
word lists or determined by a vocabulary test (i.e., words the user
could not identify their correct meaning). Then, VocabEncounter
provides the user with the opportunity to encounter the words
by presenting translated phrases containing those words in doc-
n o
JaJa Ja Ja Ja
uments
D Ja = d ,d , · · · ,d , · · · , d
. Here,
d
denotes a
1 2 iM i
document retrieved from each page the user visits during their Web
browsing.
The process of presenting the translated phrases runs whenever
the user visits a Web page. VocabEncounter rst targets phrases
to be used for the encapsulation from the page. This process is de-
signed to prioritize phrases whose English translations are supposed
to preserve their original meaning and maintain syntactic natural-
ness upon encapsulation. Formally, it enumerates pairs of a word in
n o
Ja En Ja E n Ja En Ja
W En
and a phrase in
di
like
w1 , pi,1 , w3 , pi,3 , · · · , wN , pi, N
JaJa En
where
p
is a phrase in
d
to be translated so as to contain
w
.
i, j i j
Note that there are words in
W En
that may not appear on the
Ja En
page if there is no appropriate phrase in
d
, as
w
in this case
i 2
(Figure 2). Ja
VocabEncounter then translates each
p
into an English phrase
i, j
En E n
p
under the constraint of containing
w
and scores the degree
i, j j
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
Word list
Vocabulary
test
artery
w1
En
bookishw2
En
brochurewjEn
typhoidwN
En
… …
WEn
words the user wants to learn
pi,1
Ja
pi,j
Ja
p1,2
Ja
…
diJa
dM
Ja
d2
Ja
d1
Ja
DJa
documents on the pages
the user visits
brochurewjEn
日本を紹介する冊子などが数多く発行されています……中で ……
1. Enumerate targeted pairs using
multilingual word embedding
pi,j
Ja
2. Extract a phrase of an appropriate length
based on dependency structure analysis
a lot of brochures about Japan have been publishedpi,j
En
3. Translate the phrase using constrainted decoding
日本に関するパンフレットがたくさん出版されています
p
^
i,j
Ja
4. Back-translate the translated phrase
日本を紹介する冊子などが数多く発行されています
pi,j
Ja
5. Score the similarity to the original phrase using S-BERT
Targeting
Translating
VocabEncounter
pi,1
En
pi,j
En
p1,2
En
…
The original phrase on each page
is replaced by the translated phrase
Figure 2: Example pipeline of how VocabEncounter presents translated phrases containing specied foreign words in our
proof-of-concept Chrome extension.
En JaE n
of the translated phrase
pi, j
having the meaning of
pi, j
. If
pi, j
is
judged to suciently preserve the original meaning, our system
Ja En Ja
substitutes
pi, j
on the page with
pi, j
. In addition,
pi, j
is presented
En
in a hoverbox along with the meaning of the word
w
in Japanese
j
En
when the user places a mouse pointer over
p
, as we exhibit later
i, j
in Figure 4.
3.2 Targeting
As mentioned in Section 3.1, VocabEncounter rst performs the
targeting process to identify Japanese phrases on the page to en-
capsulate
W En
. Without this process, it needs to translate a lot
of combinations across all phrases on the page and all words in
W En
, which takes a long time and makes it impossible to present
translated phrases while the user is on the page. Instead, our sys-
tem lters out combinations whose translation results would not
preserve the original meaning nor maintain syntactic naturalness.
En
More specically, for each
w
, VocabEncounter identies Japan-
j
Ja En
ese words in
d
that have a similar meaning to
w
. This is based
ij
on the assumption that phrases containing such words would be
En
suitable for encapsulating
w
in terms of preserving the original
j
meaning and maintaining syntactic naturalness. The process is
enabled using multilingual word embedding [
47
], trained to map
similar words onto close embedding vectors regardless of the lan-
En
guage. Thus, we can measure the similarity between
w
and each
j
Ja
Japanese word in
d
by calculating the cosine similarity of their
i
corresponding embedding vectors. Then, our system lists Japanese
words whose similarity exceeds a threshold th1.
For each of the listed words, VocabEncounter rst extracts a
Ja
Japanese sentence containing the word from
d
. Here, it does not
i
translate the whole sentence into English but further extracts a
phrase of an appropriate length containing the word from the sen-
tence. This is because sentences on the Web are sometimes too long,
yielding long translation results as well. On the other hand, it was
conrmed that example sentences of about ten words are eective
紹介する 冊子などが 数多く 発行されて います日本を
pi,j
Ja
brochurewjEn
1. Enumerate targeted pairs
準備が 進められる 中で
2a. Perform dependency structure analysis
2b. Traverse tree by a greedy algorithm
Dependency structure
紹介する 冊子などが 数多く 発行されて います日本を準備が 進められる 中で
Figure 3: Example pipeline of how VocabEncounter extracts
Ja
a phrase of a certain length to translate given the pair of di
En
and w .
j
in vocabulary learning [
6
]. Thus, we designed VocabEncounter not
to present long translation results that would unnecessarily con-
sume the user’s cognitive load but to extract a phrase of a certain
length (e.g., 10 or 20 words) before translating.
For this purpose, VocabEncounter applies dependency structure
analysis [
89
] (specically, ginza
3
trained on [
65
]) to the sentence to
obtain its structure tree. Here, naïve algorithms such as extracting
a certain number of words before and after the targeted word can
produce unnatural chunks that are hard to translate. Instead, our
system extracts a syntactically compositional phrase based on the
structure tree using a greedy algorithm.
Specically, let us consider a case in which a word “brochure” is
identied to have a similar meaning to a word “
冊子
”, as illustrated
in Figure 3. VocabEncounter traverses the dependency structure
tree starting from the shortest phrase containing the targeted word
(i.e., “
冊子
”). It then tries to expand the phrase by concatenating
precedent or subsequent words, prioritizing ones posing a smaller
number of dependencies that refer to words outside the extracted
phrase. The expansion is repeated until the number of words in the
phrase to be extracted reaches 10.
3https://megagonlabs.github.io/ginza/
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Arakawa, Yakura, et al.
3.3 Translating
En
Now, VocabEncounter has pairs of an English word
w
in
W En
j
JaJa Ja
and a Japanese phrase
p
in
d
. We want to translate
p
into
i, j i i, j
En E n
an English phrase
p
so as to contain
w
while preserving the
i, j j
Ja
original meaning of
p
and maintaining syntactic naturalness.
i, j
Here, a vanilla translation using an NMT model would not satisfy
the requirement of containing the specied word. Thus, our system
incorporates the NLP technique described in Section 2.3, specically,
a constrained decoding algorithm proposed by Hu et al. [
38
]. Their
algorithm can easily switch the word to be contained at no extra
cost like retraining the model, as mentioned in Section 2.3. This is
desirable for use in VocabEncounter, where it is possible that the
user updates
W En
in the short term; for example, the user may take
a vocabulary test every day to remove learned words and add new
unknown words to W En . En
Before presenting the translated phrase
pi, j
, VocabEncounter
En
judges whether
p
suciently preserves the original meaning
i, j
Ja
of
pi, j
. We expected this to contribute to the user’s experience by
ltering out translated phrases that are not much like the original
phrases. For this purpose, it performs a backward translation from
En Ja
pi, j
to
p
ˆi, j
and measures the similarity between the original phrase
Ja Ja
pi, j
and the back-translated phrase
p
ˆi, j
. The similarity between
the two Japanese phrases is obtained by calculating their semantic
embedding vectors using Sentence-BERT [
70
] and measuring the
cosine similarity of the vectors. Here, we employed this backward
translation approach rather than multilingual Sentence-BERT [27,
71
] because the similarity through a round-trip translation is known
to reect the quality of the forward translation [
55
,
63
]. Thus, this
En
approach would also contribute to ensuring the naturalness of
pi, j
.
Still, VocabEncounter determines whether to present the trans-
lated phrase not only by relying on the similarity score. Alterna-
tively, in the same way that standard NMT techniques [
84
] deter-
mine the best translation result from candidates based on their
likelihood score, we also considered the likelihood score of the
phrase evaluated by the translation model. Specically, a higher
likelihood score reects that the phrase is more likely to appear in
the training data of the translation model, giving more weight to
natural phrases. We considered this likelihood score because, in our
empirical observations, using only the similarity score sometimes
allowed presenting less syntactically natural phrases. We found
that balancing the similarity score and the likelihood score, as fol-
lows, would be suitable for maintaining syntactic naturalness while
preserving the original meaning.
JaEn Ja
Scor e p p
ˆ =
i, j , pi, j , i, j
JaJa JaEn E n Ja
2 Sbert p p
ˆ + LJa 7→ En p + LEn 7→ Ja p p
ˆ
i, j , i, j i, j ,pi, j i, j , i, j
4 (1)
Here,
Sbert (·)
denotes the similarity score obtained by Sentence-
BERT [
70
], and
L (·)
denotes the likelihood score obtained by the
En
translation model. Then, our system presents
p
to the user if
i, j
Ja En Ja
Scor e pi, j ,pi, j , p
ˆi, j
, ranging from 0 to 1, exceeds a threshold
th2
.
Figure 4: VocabEncounter presents a translated phrase con-
taining specied words and supplies its original phrase with
its meaning when the user places a mouse pointer over the
translation5 . The number of translated phrases presented
on the current page is indicated next to the address bar to
encourage users to be aware of them.
We implemented this translation process using fairseq [
66
] with
English–Japanese and Japanese–English translation models pre-
trained on JParaCrawl [
56
], a large-scale dataset made from various
resources on the Web. In addition, as we mentioned above, the
translated phrases should be presented before the user leaves the
page. To satisfy this, we designed to execute the translation on a
GPU-powered server
4
by communicating from a browser through
Websocket.
3.4 User interface
In our proof-of-concept implementation, the above processes are
delivered to the user in the form of a Chrome extension. When the
user visits a new Web page, the extension rst retrieves all text
nodes on the page and sends them to the server via Websocket.
Then, if the server nds appropriate phrases to encapsulate
W En
En Ja
and returns pairs of
p
and
p
, the extension replaces the origi-
i, ji, j
Ja En
nal phrase
pi, j
with
pi, j
. The extension also provides a hoverbox
presenting the meaning in Japanese to help the user if they do not
En
recall the meaning of the encapsulated word
w
, which is also sent
j
from the server along with the translated results. When the user
En
places a mouse pointer on
p
, the hoverbox appears, as presented
i, j
in Figure 4.
4Specically, we used a server with a NVIDIA Geforce GTX 1080 Ti.
5
The content on this page is allowed to use or modify under the CC BY 4.0 by
National Diet Library, Japan.
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
Figure 5: VocabEncounter allows the user to easily add a
word to be used for encapsulation when they nd an un-
known word on a page and want to remember by right-
clicking the word on a Web page6 .
The user can freely add a new word to
W En
on the options page
of the extension or just by right-clicking a foreign word on a Web
page that the user does not know, as shown in Figure 5. In contrast,
if the user remembers one of the words in
W En
through being
familiarized with its usages they encountered, the user can easily
remove the word from
W En
on the options page or by indicating it
by clicking a button that is placed in the upper right corner of the
hoverbox (Figure 4).
To facilitate the user’s experience of vocabulary learning with
this extension, VocabEncounter employs an incremental transla-
tion strategy. It translates the targeted phrases in the order of their
appearance on the page and returns the translation results incre-
mentally before all phrases are translated. By doing so, phrases on
the beginning of the page would be translated immediately, and
phrases later in the page would be processed while the user reads
the page from the beginning. In addition, the translation process
of each page is queued in a LIFO (last-in-rst-out) order so that
the user can see the translation results of the current page even
while the translation process of the previous pages is ongoing. Fur-
thermore, to maximize opportunities of encountering the words
to learn, our system indicates the number of translated phrases on
the current page, as shown in Figure 4, encouraging the user to be
aware of the translated phrases.
4 PILOT TEST
Before implementing the full version of VocabEncounter, we rst
tested the feasibility of the proposed approach by implementing
only the targeting mechanism (Section 3.2). This is because it was
possible that the targeting mechanism may not suciently identify
phrases to encapsulate the words to memorize, which hinders users
from encountering the words. Alternatively, it may enumerate too
many phrases, making it impossible to translate all of them in
6
The content on this page is allowed to quote, reproduce, and reprint by Ministry
of Justice, Japan.
near real time while the users are on the page. In this sense, we
considered a day-by-day browsing experience as in the example
scenario in Section 1 and examined how many words of
W En
would
go to the translation process after the targeting process is applied
to documents that an average user browses in a day.
For this purpose, we rst gathered volunteers and constructed a
dataset of
DJa
by collecting documents from the pages they visited
in a day. We also prepared a public English word list, assuming
that an average learner would set
W En
using the list based on
their English level. In terms of simulating the learner’s behavior
of using VocabEncounter for a day, it is possible to set
W En
by
randomly sampling 10 or 20 words and examine how many of
them would be encapsulated. However, to assure the generality, we
virtually considered the case of specifying all words in the list (about
2,500 words) as
W En
. Then, we applied the targeting mechanism
(Section 3.2) and evaluated how many of the words in the list could
be used for the encapsulation.
4.1 Data collection
To construct a dataset of
DJa
reecting the browsing experiences
of average users, we gathered volunteers through word-of-mouth
and online communication. Each volunteer was asked to install a
recording tool in their browser that collected documents on all the
pages they visited. Before installing the tool, an experimenter from
the authors carefully revealed its function and informed them that
their data would be anonymized for analysis. We also recommended
that they switch to a dierent account or browser when they are
accessing data that is inappropriate to send, for example, reading
condential documents. When they agreed to use the tool, the
collection started and lasted for a day. The exact timing of start and
end was left to each volunteer. After all, ve volunteers participated,
resulting in a dataset of ve D Ja .
To decide on the list of English words, we referred to the Com-
mon European Framework of Reference for Languages (CEFR) [
54
],
an international standard for classifying language ability. It de-
scribes language ability on a six-point scale, from A1 for beginners
up to C2 for those who have mastered the language. We chose B2,
a semi-advanced level used for describing those condent in using
the language because it is slightly higher than the average level for
Japanese (B1) [
29
]. Therefore, we expected that the corresponding
vocabulary list would contain some unknown words for most peo-
ple, matching the actual study needs. As a result, we adopted all
2,692 English words that correspond to B2 from a public word list
validated with more than 5,000 students [80].
4.2 Analysis
As we discussed in Section 3.2, the function of the targeting mecha-
nism highly depends on its threshold (
th1
). A higher value of
th1
lters out a greater number of the associated pairs of words and
phrases, reducing the chances that each word of
W En
would be
encapsulated in
D Ja
. In contrast, lower values of
th1
increase the
number of phrases to be translated, taking more time before they
are presented to users. Thus, we applied the targeting mechanism
with dierent
th1
using the collected dataset of
D Ja
and the chosen
list of English words and examined this point.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Arakawa, Yakura, et al.
Specically, we rst calculated the ratio of words that never
matched with any phrases in the dataset. Such cases are undesir-
able as users would not get an opportunity to encounter the word,
failing to learn it. For each
DJa
in the dataset, we applied the tar-
geting mechanism of VocabEncounter with a certain
th1
under the
assumption that all words in the list were specied as
W En
. We then
counted the number of words that were never used in the extracted
phrases and calculated the ratio of that number to all words in the
list. Finally, we averaged the ratio over a set of
D Ja
. We repeated
this process with dierent th1 and checked the averaged ratio.
As a result, we determined that, when the threshold for targeting
(i.e.,
th1
) was set to be 0
.
5, more than 80% of the words, on average,
were used for the encapsulation at least once. From this result, we
can expect users to be exposed to most of the words they want
to remember within a day. In addition, we empirically conrmed
that, when we set
th1 =
0
.
5 with
W En
of around 30 words, the
targeting mechanism enumerates at most about 10 pairs of a word
and a phrase on regular Web pages, which can be translated in near
real time while users are on the pages. These points assured us that
VocabEncounter could be implemented to allow users to encounter
the words to learn during Web browsing. We thus implemented the
full version of VocabEncounter (i.e., with the translation mechanism
and user interface) and evaluated its eectiveness in the following
sections.
5 HYPOTHESES
Up to this point, we have introduced VocabEncounter as a sys-
tem for vocabulary learning, which can present the usages of the
words to learn during daily browsing in a contextualized manner.
The envisioned scenario and design of VocabEncounter, which we
discussed in Section 1 and Section 3.1, respectively, impose sev-
eral hypotheses. These hypotheses are required to be veried to
ascertain the feasibility and eectiveness of VocabEncounter in
transforming our daily life into a eld of vocabulary learning.
First, it should be examined whether VocabEncounter can gen-
erate translated phrases containing specied words with quality
enough to support users learning vocabulary during their browsing.
Specically, we need to present natural word usages to familiar-
ize users with how the words are used. Moreover, the translated
phrases should preserve the meaning of their original phrases and
not interfere with users reading the content on a Web page. Thus,
the following H1 is posited.
H1: VocabEncounter can generate natural usages of
specied foreign words by translating phrases on the
Web without losing their original meaning.
If H1 holds, it then means that VocabEncounter is capable of
oering usage-based learning during browsing experiences. As dis-
cussed in Section 2.2, this would allow the users to fully benet
from both micro and usage-based learning through being familiar-
ized with the contextualized usages of the words to learn in their
daily lives.
We then expect that VocabEncounter can lead to a better learning
outcome than simply presenting the words to learn in the same
manner as conventional micro learning approaches (Section 2.1).
Thus, our second hypothesis is the following:
H2: VocabEncounter can induce a better learning out-
come than simply showing the words to remember
during Web browsing through presenting their gen-
erated usages.
Lastly, in order for VocabEncounter to be practically adopted
for supporting vocabulary learning, its usability and ecacy when
used in users’ daily lives should be investigated. As envisioned in
the example scenario in Section 1, we anticipate that users would
favor the minimized hurdle for studying delivered by the experience
of encountering the usages of the words they want to remember
during daily browsing, positing the following hypothesis:
H3: The experience of learning with VocabEncounter
is favored by users when used in their daily lives,
thanks to its design of achieving both micro and usage-
based learning.
If these hypotheses are supported, we can conclude that Vo-
cabEncounter opens up a new learning strategy that embraces the
advantages of both micro and usage-based learning through an
NLP-powered approach, which can transform our daily lives into
a eld of vocabulary learning. With this motivation, we evaluated
these hypotheses by conducting a series of experiments.
6 EXPERIMENT I: QUALITY OF THE
TRANSLATED PHRASES
We rst conducted an oine experiment to evaluate the quality
of phrases translated by VocabEncounter under the constraint of
containing one of the specied words in correspondence of H1.
Specically, we asked human assessors uent in Japanese and Eng-
lish to check whether the translated phrases are natural while
preserving the original meaning. As a baseline for comparison, we
prepared two conditions: phrases translated by a human translator
with the same constraint of using the specied words and phrases
translated by the same NMT model without the constraint of using
the specied words.
6.1 Design and material
We rst introduce the proposed condition, which refers to the out-
put of VocabEncounter; i.e., the phrases translated by the NMT
model with the constrained decoding algorithm so as to contain
the specied English words, as described in Section 3.3. The next
one is the human condition, representing the phrases translated by
a human translator who is bilingual and lived in the UK for years.
The person voluntarily participated before being told about the
concept of the experiment and was asked to translate the Japanese
phrases while using the corresponding specied words. If the evalu-
ations by human assessors were not signicantly dierent between
the proposed and human conditions, it is implied that our system
achieves at least human-level quality under the constraint of using
the specied word.
Furthermore, we prepared the vanilla condition, which is the
output of the same pretrained NMT model with the proposed condi-
tion [
56
] but translated without the constraint of using the specied
words. The vanilla condition was introduced as the best-eort case.
By comparing its evaluations with those of the other two, we can
know how dicult the translation task involving the encapsulation
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
is and how well the proposed algorithm would work within the
task.
To support H1, we expect that the proposed condition works com-
parably with the human condition, assuring that VocabEncounter
can generate the usages with a certain level of naturalness and
meaning preservation such that they can be used for vocabulary
learning. Moreover, we expect that the phrases generated by the
proposed condition, i.e., with the constraint of using the specied
word in the translation, are not signicantly dierent from those
generated by the vanilla condition, i.e., without such a constraint.
If these points are conrmed, users can encounter natural usages
of the words contextualized in their reading content.
To prepare sample phrases for evaluation, we randomly extracted
60 pairs of English words and corresponding Japanese phrases
targeted in the pilot test (Section 4). Then, we obtained three English
phrases corresponding to the above three conditions for each of the
extracted pairs, yielding 180 phrases in total. We present some of the
phrases we obtained in this experiment (see Table 1 in Appendix).
6.2 Measure
To reect H1, we examined the quality of the translated phrases in
terms of their naturalness and meaning preservation. Both of these
were measured through questionnaires using a 5-point Likert scale
(with 1 indicating “strongly disagree” and 5 indicating “strongly
agree”). We rst showed a translated English phrase to assessors
and asked them, “How natural is this phrase in English?” Then, we
also showed its original Japanese phrase to them and asked, “How
well does the English phrase preserve the meaning of the original
Japanese phrase?”
6.3 Procedure
To compare the measures introduced in Section 6.2 across the three
conditions, we used a crowdsourcing service and gathered 24 Japan-
ese assessors. They self-reported a high level of prociency in Eng-
lish which was equivalent to or higher than CEFR B2. Each assessor
evaluated 20 phrases for each condition (60 phrases in total) using
the questionnaires, as explained in Section 6.2. As illustrated in
Figure 6, we divided the assessors into six groups of four to balance
the assignment of the conditions. Then, the assessors evaluated
the phrases one by one in random order. The entire process took
approximately 45 minutes, and we paid approximately $10 as a
reward to each assessor.
6.4 Results
According to the Kruskal–Wallis test (
p =
0
.
001), we found a sig-
nicant dierence among the three conditions for the naturalness
measure. We thus conducted a post-hoc test and conrmed that the
vanilla condition obtained a signicantly better evaluation than
the others (
p <
0
.
05), as presented in Figure 7. However, we could
not nd a signicant dierence between the evaluations for the
proposed and human conditions. We further veried the equiva-
lence of the two conditions using two one-sided tests based on
the Mann–Whitney test [
86
] with an equivalence margin
θ
of 0.5,
as Joosse et al. [
41
] did. As a result, the equivalence between the
evaluations for the proposed and human conditions was supported.
The results indicate that, although the words to be encapsulated
were selected through the targeting mechanism (Section 3.2), the
constraint of containing the words can aect the naturalness of
translation results. This can be conrmed by examining the trans-
lation results shown in #8 and #9 of Table 1 (from Appendix). In
the former case, the word “pluck” was targeted in association with
“
引っ張る
[pull]” and contained in the translation results of the
proposed and human conditions. While “pluck” has a similar mean-
ing to “pull,” people would not use it frequently when they talk
about muscle training. This subtle dierence in nuance could reduce
the naturalness of the translations. We also found an interesting
phenomenon in the latter case, that is, the proposed method tried
to contain the word “prom,” which was associated with “
デート
[date],” but eventually contained the word “promise.” This is due to
the implementation of the constrained decoding algorithm, which
uses a subword tokenizer [
46
] to allow the conjugation of the spec-
ied word by sometimes concatenating multiple (sub)words. While
the obtained translation can convey the meaning of the original
phrase, this case could contribute to the reduction of the naturalness
and, in turn, suggests room for improvement in our implementation
(see Section 11.2).
Yet, as we can infer from the fact that the eect size of the com-
parison between the proposed and vanilla conditions was relatively
small (
r =
0
.
109), it does not immediately imply the infeasibility of
the proposed method in presenting the usages of the words. No-
tably, the naturalness of translation results given by the proposed
method was at least equivalent to those given by a bilingual speaker
under the same constraint of encapsulating the specied words.
Moreover, the assessors’ evaluations of the meaning preservation
did not show signicant dierences among the three conditions
according to the Kruskal–Wallis test (
p =
0
.
124), as illustrated in
Figure 8. The two one-sided tests with the
l
-correction [
48
] further
supported the equivalence between any two pairs of the three
conditions. Thus, it is suggested that replacing the original phrases
with their translations produced by the proposed method would not
bother users by presenting phrases losing semantic information.
From these points, we conclude that H1 is partially supported.
Specically, VocabEncounter can present the word usages with a
human-comparable quality regarding their naturalness and mean-
ing preservation. On the other hand, the naturalness of the usages
could not be the best eort due to the constraint of containing
the specied words. Still, we cannot deny the possibility that Vo-
cabEncounter can support vocabulary learning via presenting such
usages. We thus decided to run online experiments to evaluate the
eectiveness of VocabEncounter as an entire vocabulary-learning
system, including the eect of presenting those usages during Web
browsing, in the following sections.
6.5 Deciding the threshold for presenting
phrases
To run VocabEncounter as an entire system, we need to decide
the threshold
th2
, which is used for suppressing translated phrases
that do not hold naturalness or their original meaning (Section 3.3).
For this purpose, we also examined the relationship of the score
obtained by Sentence-BERT and translation models (Equation 1)
to the assessors’ evaluations. As presented in Figure 9, we found
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Arakawa, Yakura, et al.
Group 1
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
Group 2
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
Group 3
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
Group 4
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
Group 5
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
Group 6
Translated phrases
human
1…20 21 40 6041
proposed
vanilla
… …
* The presentation order of phrases is shued for each assessor.
Figure 6: Assignment of the three conditions across the six groups of the assessors. The presentation order of the phrases is
also randomly shuled for each assessor.
Figure 7: Comparison of the assessors’ evaluations among
the three conditions regarding the naturalness of the trans-
lated phrases. We found signicant dierences between the
vanilla condition and the other conditions (p < 0.05).
that the score of each translated phrase correlated to the averaged
evaluation of the phrase regarding its degree of meaning preserva-
tion (
ρ =
0
.
717
, p <
0
.
001) while its correlation to the naturalness
of the phrase was not signicant (
p =
0
.
077). Considering that the
Sentence-BERT model is trained to score the semantic similarity
between given sentences and its output is weighted in Equation 1,
this result is reasonable and consistent with our expectation.
Here, we have to be aware of the trade-o between the quality
of the translated phrases to be presented and the chances for each
word of
W Ja
to be encapsulated in
D Ja
, which is moderated by
th2
. The higher threshold value leads to better translation results
Figure 8: Comparison of the assessors’ evaluations among
the three conditions regarding the meaning preservation of
the translated phrases. We found no signicant dierence
among them.
while allowing users to encounter fewer words they want to learn.
In this case, we concluded that
th2 =
0
.
6 is reasonable. According
to Figure 9, this value yields translations whose naturalness would
be evaluated better than the neutral option of the Likert scale on
average. In addition, using the data we collected in Section 4.1,
we conrmed that the recall of the
W En
was 0
.
752 when
th2 =
0
.
6. In other words, an average user is presumed to encounter
approximately 75% of the words in an encapsulated manner when
they use VocabEncounter for a day.
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
Figure 9: Relationships of the score obtained by Sentence-BERT and translation models (Equation 1) to the assessors’ evalua-
tion on the naturalness (left) and meaning preservation (right).
7 EXPERIMENT II: EFFICACY ON LEARNING
OUTCOME
We next conducted a user study to examine whether VocabEn-
counter, as in the example scenario in Section 1, can help users
memorize foreign words. To evaluate this point, we gathered par-
ticipants and had them use VocabEncounter on their PCs for a day
while browsing the Web as usual. Here, as we discussed in Section 5
to introduce H2, we expect that VocabEncounter helps users learn
words, particularly by presenting their generated usages (i.e., a
phrase-based interface). To verify this point, we prepared a word-
based interface for comparison. It simply replaces the word in the
original content, which is identied in the targeting mechanism
(Section 3.2), with the word to remember (Figure 10). As we men-
tioned in Section 2.1, previous works have shown the ecacy of
such a word-level translation on vocabulary learning, leveraging
the strategy of micro learning. Therefore, H2 is supported if the
phrase-based condition leads to a better learning outcome than the
word-based condition.
7.1 Design
We used a within-participant design comparing the prepared con-
ditions and examined their learning outcome. Here, to be a fair
comparison, we needed to carefully design the words that each
participant learn in this experiment. In particular, the words should
be unknown to the participant and have the same diculty (i.e.,
vocabulary level). Thus, we rst asked each participant to take a
pretest to identify words that they did not know from the same
CEFR level, as detailed later in Section 7.3.
They then spent a day using the Chrome extension while each
identied word was assigned to each experimental condition. Here,
in addition to the phrase-based condition and the word-based condi-
tion, we added the not-used condition. This not-used condition was
introduced as a baseline to measure the learning outcome without
using the proposed system, i.e., how the participant could learn a
word from a one-shot exposure at the pretest. More specically, if a
word was assigned to the phrase-based condition or the word-based
condition, the participant could encounter them in a way shown in
Figure 10, respectively. On the other hand, if a word was assigned
to the not-used condition, the word was never shown to the partic-
ipant. The words identied as unknown to the participant at the
pretest were then randomly assigned to each of the three condi-
tions, and this assignment did not change during the experiment.
Note that, although we mentioned that users could add and delete
words to remember when using VocabEncounter (Section 3.4), we
deactivated this feature to keep the words in each condition the
same during the experiment.
7.2 Measure
In order to measure the learning outcome of a participant, we
used a vocabulary test and examined their correct answer rate on
the words identied as unknown in the pretest. Specically, in a
similar manner as [
90
], we had them take a posttest two days after
using the Chrome extension. By comparing how many of the words
they correctly answered between the experimental conditions, we
evaluated how VocabEncounter contributed to the memorization
of foreign words.
7.3 Procedure
We gathered ten participants (three women and seven men, ages
ranging from their 20s to 40s.) through word-of-mouth and online
recruiting, who were native Japanese speakers and self-reported
that their English level was equivalent to or slightly below CEFR
B2. This level of mastery was selected to assure there would be
some unknown words in the word list corresponding to B2, which
motivated us to use the same list in Section 4.1.
The procedure of our experiment simulates the example scenario
described in Section 1: the participants rst tried to memorize
unknown words; they spent a typical day using VocabEncounter;
and their word memorization was tested. Figure 11 presents the
actual procedure we used.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Arakawa, Yakura, et al.
The word-based condition The phrase-based condition
Figure 10: For comparison, we prepared a word-based interface (left) that simply replaces a word instead of presenting a trans-
lated phrase (right).
As we mentioned in Section 7.1, we rst had the participants
take a pretest to identify unknown words (Figure 11A). Here, words
from the B2 word list were shown to each participant in a random
order along with ve options for its meaning written in Japanese
(including one indicating “I do not know this word”). The partici-
pant was asked to choose one of the ve options. The words were
shown until the participant answered incorrectly or indicated lack
of knowledge 60 times in total.
After 60 unknown words were identied, they were then ran-
domly assigned to each of the three conditions so that each condi-
tion had 20 words. In addition, their correct answers were shown so
that the participant could know their meanings. This was intended
to perform a fair comparison within the three conditions; that is,
the participant might never encounter some words assigned to the
phrase-based and word-based conditions during the experiment (see
Section 6.5). Thus, without this process, the participant would have
no chance to know the meaning of such words, which makes the
comparison of the correct answer rate across the two conditions
and the not-used condition improper.
On the experiment day, we rst had the participant watch a
video describing how to use VocabEncounter. Then, they installed
VocabEncounter in their PCs and spent the day as usual while
encountering words during Web browsing (Figure 11B). Two days
after the experiment day, we asked the participant to take a posttest
(Figure 11C), which had the same format as the pretest but only
presented the 60 words they had answered incorrectly in the pretest.
The participants were given approximately $50 as a reward for their
participation. Note that we conducted the experiment remotely for
all participants.
(A) Pretest (C) Posttest(B) Spend a day with
VocabEncounter
CEFR B2
words
incorrect or
don’t know
correct
phrase-based
word-based
not-used
20 words
for each
Compare
the correct rate
75.00%
63.89%
56.67%
Participants
Conditions
Day 0 Day 1 Day 3
Figure 11: Procedure of the experiment. (A) Participants rst
took the pretest, and the words they did not know were
used as W En . (B) On the experiment day, they used VocabEn-
counter while they browsed the Web as usual. (C) Two days
after the experiment day, they took the posttest to examine
whether they had learned the words.
7.4 Results
Figure 12 shows the correct answer rate across the three experimen-
tal conditions. Our repeated-measures ANOVA analysis indicated a
signicant dierence between the conditions (
F =
7
.
909
,p =
0
.
003),
and the post-hoc test showed that the phrase-based condition re-
sulted in a signicantly better rate than the word-based condition
(
p =
0
.
046) and the not-used condition (
p =
0
.
001). These results
conrm that VocabEncounter helped the participants learn foreign
words without requiring them to take dedicated time for studying.
In particular, its presentation of the generated usages enhanced the
VocabEncounter CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
Figure 12: Comparison of the correct answer rate achieved
by the participants. The phrase-based condition exhibited a
signicantly better rate than the word-based condition (p =
0.046) and the not-used condition (p = 0.001).
learning outcome, compared to presenting only the word. From
these points, we conclude that H2 is supported.
8 EXPERIMENT III: LONG-TERM USER
STUDY
So far, we have shown that VocabEncounter can generate natural
usages of specied words without losing the original context (Sec-
tion 6) and the