Content uploaded by Ko Watanabe
Author content
All content in this area was uploaded by Ko Watanabe on Feb 22, 2023
Content may be subject to copyright.
Experience is the Best Teacher: Personalized Vocabulary Building
Within the Context of Instagram Posts and Sentences from GPT-3
KANTA YAMAOKA, Osaka Prefecture University, Japan
KO WATANABE, University of Kaiserslautern & DFKI GmbH, Germany
KOICHI KISE, Osaka Metropolitan University, Japan
ANDREAS DENGEL, University of Kaiserslautern & DFKI GmbH, Germany
SHOYA ISHIMARU, University of Kaiserslautern & DFKI GmbH, Germany
An Instagram post Generated sentence from the Instagram post using GPT-3 Users discover new words and learn them by heart
Fig. 1. The proposed method to find new words within the context of Instagram posts and sentences generated from GPT-3
Although language learners have dierent contexts and motivations, sensing personal backgrounds to optimize learning materials
has still been challenging. By focusing on the huge movement of Social Networking Services (SNS) such as Instagram, we came up
with the idea of utilizing social posts, in particular images, as learning materials. This paper presents our working prototype of the
proposed system that extracts keywords from these images and leverages GPT-3 to generate sentences for acquiring new vocabulary
around the keywords. By conducting a pilot study involving three users, we found that 2.2 words appeared as unknown words for the
user in one generated sentence on average, and there is room for improvement in the proposed system. These ndings can be utilized
in a large-scale evaluation designed in the future.
CCS Concepts: •Applied computing →E-learning;•Human-centered computing →Social media.
Additional Key Words and Phrases: Context-Aware Language Learning; Personalized Learning; Intelligence Augmentation; GPT-3
ACM Reference Format:
Kanta Yamaoka, Ko Watanabe, Koichi Kise, Andreas Dengel, and Shoya Ishimaru. 2022. Experience is the Best Teacher: Personalized
Vocabulary Building Within the Context of Instagram Posts and Sentences from GPT-3. In Proceedings of the 2022 ACM International
Joint Conference on Pervasive and Ubiquitous Computing (UbiComp/ISWC ’22 Adjunct), September 11–15, 2022, Cambridge, United
Kingdom. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3544793.3560382
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from permissions@acm.org.
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Manuscript submitted to ACM
1
UbiComp/ISWC ’22 Adjunct, September 11–15, 2022, Cambridge, United Kingdom Yamaoka, et al.
1 INTRODUCTION
Our experiences are considered to be stored in the brain as episodic memory. According to research in psychology
and neuroscience elds, applying new information to such existing memory is more eective than doing so with
non-familiar topics [
1
]. Associating new knowledge with existing memory with emotions, interests, or past experiences
is one of the promising ways to learn things eectively while having fun.
In the past, we had few clues to determine such personal backgrounds other than asking questions in person, which is
almost impossible considering the eort it requires. Today, however, social media for sharing photos like Instagram are
widespread among the young generation. People describe their experiences, thoughts, or emotions on such a platform,
along with pictures. These posts are informative in that people post them, especially when they are impressed, touched,
or excited about something in their personal lives. Therefore, such photos can be good sources for sensing or estimating
learners’ interest, his/her targets of emotions, or past experiences–to personalize content to learn and contexts where
the new content can be associated to make the new content easier to remember.
This paper proposes a working-in-progress learning system to generate sentences from learners’ Instagram posts
(see Figure 1). A sentence generated from the system is shown on the right side of the screen. On the left side, a user’s
Instagram picture is shown. Our system is not intended to provide image-to-text captioning. Instead, we want to give an
example sentence likely to relate to the user’s experiences or interests and discover unknown but valuable vocabulary
for their daily situations. By such design, users are supposed to click words in a sentence if they do not know the
meanings, allowing them to show its lemma and lemma’s translation in the native language of the user.
We hypothesized that associating a new vocabulary from one’s personal experience – visible on his/her Instagram
posts – is one of the promising ways to optimize learning materials. With this idea, we aim to enhance learning
performance by nding and learning new vocabulary within the context – as a sentence and an image – that is
personalized to their lives. In order to evaluate our hypothesis, we conducted a pilot study including recall tests and
compared the retention ratio compared with two baselines.
2 BACKGROUND AND RELATED WORK
2.1 Context-Aware Language Learning Systems
As many researchers have reported, the context of a learner contains the potential to personalize learning materials.
For instance, Jacquet et al. proposed Vocabulometer, which recommends text based on the user’s vocabulary estimated
from reading activity sensed by eye-tracking devices [
4
]. To extend their study, Yamaguchi et al. proposed Mobile
Vocabulometer, where users choose topics of interest within the application, and it recommends English articles based
on the selections to discover new vocabulary within the context [
6
]. The following topics were provided within
the application, “entertainment, economy, environment, lifestyle, politics, sport, and science.” While they provide
preliminary work to adapt interests by user topic selection, our interest is more specic in reality. Take the topic sport,
for example; some people are into baseball while others might like football. Our interests and experiences in daily lives
are diverse; we need to sense such backgrounds, which is another motivation for our study.
Hautasaari et al. proposed an application called VocaBura which allows language learners to acquire vocabulary using
their idle hours [
3
]. According to the study, this audio-based application allows users to nd new vocabulary based on
the user’s location history. While they focus on personalization using location history, we focus on personalization
using social image data.
2
Experience is the Best Teacher: Personalized Vocabulary Building
Within the Context of Instagram Posts and Sentences from GPT-3 UbiComp/ISWC ’22 Adjunct, September 11–15, 2022, Cambridge, United Kingdom
Instagram posts
of a human learner
Multi-label detection model API
Text-to-text model API
Nalural Language Processing API
Content Pool
Syntax info.
e.g., lemma, part-of-speech
Unknown word extraction within generated sentences
along with images by the human learner
Translation API
Images
Labels
Sentences
STARTING FROM HERE
Images
Batch Preprocessing-side
(Content-generation)
Client-side
(Unknown word extraction)
ENDING HERE
Unknown words
List of personalized vocabulary
Web browser interface hosted on the cloud
Sentences
Personalized
dictionary
Fig. 2. System overview of out proposed system.
(i) Our prompt to generate a sentence from keywords extracted from an image
Contents within curly braces were replaced.
Parameter
Value
model
text-davinci-002
temperature
0.5
max_tokens
100
top_p
1
frequency_penalty
0
presence_penalty
0
(ii) Some of our parameters for GPT-3
Fig. 3. Summary of GPT-3 configurations, prompt (le) and main parameters (right).
2.2 Text-Generation Using GPT-3
GPT-3 is a text-to-text model where both input and output are text, allowing tasks and few-shot demonstrations by text
interaction according to the original paper where GPT-3 is introduced [
2
]. GPT-3 is learned with 175 billion parameters,
and it often provides natural responses. For example, if a text input – often called a prompt – "Can I visit Japan from
Germany by foot?" is given to the GPT-3, the model would respond like this: "No, you cannot visit Japan from Germany
by foot. You would need to take a plane, bus, or train." While GPT-3 was announced in 2020, its API is open to everyone
at the end of 2021
1
. Due to its novelty, its potential use cases or usability in an educational context still has room for
investigation. Therefore, our study is exploratory.
3 PROPOSED SYSTEM
As shown in Figure 2, our system comprises multiple steps. Harnessed by cloud computing APIs, our system generates
labels from exported Instagram images, labels to sentences, analyzes generated sentences, and prepares translations.
3.1 Detecting Keyword-Objects from Instagram Image
Given images from learners’ exported Instagram data, we feed them into a multi-label detection API – Google Cloud
Vision API, providing up to ten labels per image. Take an example image from Figure 1; the following labels were
detected from the image: Cloud, Skyscraper, Building, Sky, Daytime, Window, etc. We feed these labels into GPT-3 so that
people can discover new words – (a) co-occurrences suggested by GPT-3 and (b) detected labels – related to the context
strongly tied with the learner’s experiences.
1https://openai.com/blog/api-no- waitlist/
3
UbiComp/ISWC ’22 Adjunct, September 11–15, 2022, Cambridge, United Kingdom Yamaoka, et al.
Find and learn unknown words Distraction Task (Addition Task, 1 min.) Recall Task (Given a translation, spell an unknown word)
Proposed Method C1, Two Baselines: C2, and C3 Evaluation
Fig. 4. Overview of evaluation procedure of our system.
3.2 Generating Sentences by Keywords and GPT-3
We integrated GPT-3’s completion API, which provides sentences to our system with our prompt and keyword labels.
Currently, GPT-3 [
2
] is publicly accessible as a beta version. By nature of the text-to-text model, instructions to the
model are given by sentences. Our prompt is described in Figure 3. Lines 1-3 clarify what the model is expected to do.
Our system inserts keyword labels to the line starting from Keywords. GPT-3 appends a sentence after One-Sentence.
Some work, e.g., [
5
], tries to nd a better prompt. However, we keep our prompt simple as it is not our scope of work.
3.3 Providing Translations for Each Word
After all the sentences are generated, we analyze syntax by using Google Cloud Natural Language API. Through this
process, the API oers morphological analysis capabilities, providing us with the following helpful information for our
system: part-of-speech and lemma. We need lemmas to prevent the system from extracting multiple but essentially the
same words over and over, such as car and cars. Users can click each sentence word within our system to learn them
with standard forms and the corresponding translation of the standard forms.
4 EVALUATION
We conducted a pilot study shown in Figure 4 to evaluate our hypothesis: “Displaying a picture taken by the user
and related to a sentence is useful in exploring unknown words for language learning.” Since this hypothesis can be
decomposed into two factors (1: my image or someone’s image, 2: displaying or not displaying an image), the following
three conditions were mixed on the system, and recall ratios of unknown words of each condition were compared.
•C1: Displaying an image provided by the user and a sentence generated from the image (Proposed)
•C2: Displaying an image provided by someone and a sentence generated from the image (Baseline 1)
•C3: Displaying only a sentence generated by an image provided by someone (Baseline 2)
4.1 Experimental Design
We recruited three Japanese undergraduate students who had posted at least 30 images on Instagram. This requirement of
many photos was to collect data on a sucient number of unknown words. As a pre-task of the experiment, participants
submitted photos excluding sensitive data, screenshots, potential copyright issues, or texts as the main content. Pictures
by others for C2 and C3 were provided by the authors. Participants who completed tasks received 1,000 JPY as a reward.
This study has been approved by the ethics committee of Osaka Metropolitan University.
4
Experience is the Best Teacher: Personalized Vocabulary Building
Within the Context of Instagram Posts and Sentences from GPT-3 UbiComp/ISWC ’22 Adjunct, September 11–15, 2022, Cambridge, United Kingdom
P1 examples selected out of 8
P2 examples selected out of 29
P3 examples selected out of 16
1
I am a masterful journalist and can depict daily
life with my advanced vocabulary, whether it's
describing someone's personal style or a new
product on the market.
The wooden structure is an impressive
landmark with its intricate architecture and
symmetrical lines.
The magenta-colored dish was sweet
and delicious, and it was made with
love.
2
The art of cheesemaking is a complex and
delicate process, requiring the utmost care and
precision at every stage.
The brown wood beam is a material
property that has a ceiling of symmetry and
tints and shades of pattern.
The dish was garnished with a leaf
vegetable and served with a seafood
recipe.
3
The food of Germany is as varied as its people,
with each region boasting its own specialties.
The wood cabinetry was a beautiful fawn
color with hardwood floors that matched the
chair perfectly.
The jaw-dropping recipe for the dish
was simple, yet the cuisine was
exquisite.
4
The plant is responsible for the production of
automotive lighting and rolling stock for the
locomotive and freight car industry.
The table was a perfect addition to the
room and really brought the whole space
together. [no unknown words]
The clouds in the sky are like a
painting, with their different shapes
and colors.
5
The sky is dark and the buildings are illuminated
by the car headlights and the tire lights.
The sky is a beautiful blue and there are
some fluffy white clouds.
The temple was crowded with people
enjoying the public space.
[no unknown words]
P1 P2 P3
0
2
4
6
8
Score (maximum 10)
C1
C2
C3
Fig. 5. Results of the pilot study. Le: Random samples of generated sentences where extracted words are highlighted with red. Right:
Recall task scores of all participants.
In the learning phase, participants clicked unknown words to check their meanings. After selecting all unknown
words in one sentence, participants clicked the button to display the next sentence. This process continued until the
number of unknown words reached 10 in three conditions (30 words in total) or all images prepared for C1 were shown.
As in the middle of Figure 4, the system displayed a distraction task after users extracted unknown words from the
three conditions. The simple addition task of three one-digit numbers was carried out for over 1 minute. The next recall
task prompted the user to spell out English words within the user’s unknown word list in a shued order, providing
scores for each condition.
4.2 Results
Figure 5 (left) shows examples of the sentences shown to participants during the experiment. To collect 30 unknown
words (10 for each condition), 8, 29, 16 sentences were required to be displayed, respectively. To put in another form, 2.2
words appeared as unknown words in one sentence on average (
(
30
/
8
+
30
/
29
+
30
/
16
)/
3
=
2
.
2). We found that the ratio
of words extracted per sentence varied among the users depending on their extracted words and their language skills.
We compared submitted answers and expected answers after converting them into lowercase. Scores of the recall
test for each condition and participant are shown in Figure 5 (right). In P1, C1’s score outweighs other conditions. In P2,
scores are the same with C1 and C3. In P3, C3 outweighs other conditions.
4.3 Discussion
Even if we had a limited number of participants, we observed ndings and the following challenges that can be addressed
before conducting a large-scale experiment.
Word diculty and language skill – The number of words extracted per sentence varied depending on users. One
of the possible explanations is that we focused on evaluating the proof of concept, and the system does not consider
learners’ English levels. In future research, we need to make our system aware of learners’ English levels [6].
Optimization of GPT-3 parameters – In the rst sentence in the case of P1 from Figure 5, we can see that our
prompt, as in Figure 3, is similar to the generated sentence. This type of sentence, i.e., overtting to line 1 of our prompt,
was found only once among all the sentences, including sentences we could not show in Figure 5. Considering the
amount of generated sentences shown to participants until they nished word extraction were 8, 29, and 16 (respectively
for P1-P3), this unwanted behavior is considered negligible. In the future, it is better if we can prepare automatic
guidelines to select proper sentences from the pedagogical point of view because GPT-3 is a novel technology.
5
UbiComp/ISWC ’22 Adjunct, September 11–15, 2022, Cambridge, United Kingdom Yamaoka, et al.
The number of image resources – In formal English education, the levels of words are often aligned when learners
before they take exams; therefore, evaluating their performance can be done by simply comparing scores within the
same level, e.g., 3rd grade in high school. In our system, however, extracting words is a dynamic process. Therefore,
comparing answer correctness within similar vocabulary diculties is crucial to evaluate our software. To do so,
increasing the number of trials within an individual is essential. This, however, brings another challenge: Instagram
Posts Are Finite. As a future direction, the learner’s smartphone’s camera rolls instead of Instagram posts. We used
Instagram posts mainly because they reect their experiences, emotion, or interests. In this sense, a tradeo between
sample size and the strength of impressions toward images is inevitable.
5 CONCLUSION
We proposed a working-in-progress learning system to generate sentences from learners’ Instagram posts. We hypoth-
esized that associating a new vocabulary with one’s personal experience is one of the promising ways to optimize
learning materials. We conducted a pilot study with three Japanese students. Our experiment claried that the number of
sentences they needed to extract a certain amount of unknown words varied depending on the participants. To address
the issue, we need to make our system aware of learners’ English levels in the future. Also, our system allows learners’
to extract words, unlike formal education where word levels are aligned. Therefore, our future investigations should
compare scores within similar word groups of diculties. To do so, we need to increase posts from learners. However,
since Instagram posts are nite, it is not an easy task. Therefore, using learners’ camera rolls on their smartphones
instead of their Instagram posts is one of our possible future research directions. While the pilot study did not provide
an answer to our hypothesis, we will further extend our research from the lessons the study supplied.
ACKNOWLEDGMENTS
This work was supported by DFG International Call on Articial Intelligence Learning Cyclotron, JSPS Fostering Joint
International Research (B) (Grant Number: 20KK0235), and JASSO Student Exchange Support Program.
REFERENCES
[1]
Garvin Brod, Markus Werkle-Bergner, and Yee Lee Shing. 2013. The inuence of prior knowledge on memory: a developmental cognitive neuroscience
perspective. Frontiers in behavioral neuroscience 7 (2013), 139.
[2]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jerey
Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam
McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Proceedings of the 34th International
Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 159,
25 pages.
[3]
Ari Hautasaari, Takeo Hamada, Kuntaro Ishiyama, and Shogo Fukushima. 2019. VocaBura: A Method for Supporting Second Language Vocabulary
Learning While Walking. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 135 (dec 2019), 23 pages.
[4] Clément Jacquet, Olivier Augereau, Nicholas Journet, and Koichi Kise. 2018. Vocabulometer, a Web Platform for Ubiquitous Language Learning. In
Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable
Computers (Singapore, Singapore) (UbiComp ’18). Association for Computing Machinery, New York, NY, USA, 361–364.
[5]
Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of
the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York,
NY, USA, Article 314, 7 pages.
[6]
Kohei Yamaguchi, Motoi Iwata, Andrew Vargo, and Koichi Kise. 2020. Mobile Vocabulometer: A Context-Based Learning Mobile Application to
Enhance English Vocabulary Acquisition. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing
and Proceedings of the 2020 ACM International Symposium on Wearable Computers (Virtual Event, Mexico) (UbiComp ’20). Association for Computing
Machinery, New York, N Y, USA, 156–159.
6