Content uploaded by Ziyi Zhang
Author content
All content in this area was uploaded by Ziyi Zhang on Nov 29, 2023
Content may be subject to copyright.
Introducing Reinforcement Learning to K-12
Students with Robots and Augmented Reality
Ziyi Zhang1, Kevin Lavigne2, William Church3,
Jivko Sinapov1, and Chris Rogers1
1Tufts University, Medford MA 02155, USA,
2Hanover High School, Hanover NH 03755 USA,
3White Mountain Science, Inc., Littleton NH 03561, USA
{ziyi.zhang,jivko.sinapov,chris.rogers}@tufts.edu
kevin.lavigne@dresden.us
wchurch@whitemountainscience.org
Abstract. As artificial intelligence (AI) plays a more prominent role in
our everyday lives, it becomes increasingly important to introduce basic
AI concepts to K-12 students. To help do this, we combined physical
robots and an augmented reality (AR) software to help students learn
some of the fundamental concepts of reinforcement learning (RL). We
chose RL because it is conceptually easy to understand but has received
the least attention in previous research on teaching AI to K-12 students.
We designed a series of activities in which students can design their
own robots and train them with RL to finish a variety of tasks. We
tested our platform with a pilot study conducted with 14 high school
students in a rural city. Students’ engagement and learning were assessed
through a qualitative analysis of students’ behavior and discussions. The
result showed that students were able to understand both high-level AI
concepts and specific RL terms through our activities. Also, our approach
of combining virtual platforms and physical robots engaged students and
inspired their curiosity to self-explore more about RL.
Keywords: Reinforcement Learning ·Augmented Reality ·Educational
Robot ·K-12 Education
1 Introduction
Artificial Intelligence (AI) is progressively transforming the way we live and
work. Therefore, it is increasingly important to introduce AI concepts to K-
12 students to build familiarity with AI technologies that they will interact
with. Reinforcement Learning (RL), a sub-field of AI, has been demonstrated to
positively contribute to many fields, including autonomous driving [1], control
theory [2], chemistry [3], etc. Recently, ChatGPT, a chatbot fine-tuned with
RL and supervised learning, has led to extensive discussions in society [4]. For
students, basic concepts of RL are intuitive and attractive to learn since it is
similar to our cognition of nature of learning [5]. However, most current platforms
2 Z. Zhang et al.
and empirical research on introducing AI to K-12 students focused on high-level
AI concepts and were mostly based on training supervised learning models [6, 7].
There was limited research on introducing RL to K-12 students [8, 9]. Also, the
activities in these RL teaching projects were all fully developed in the simulated
world and there is no research on using physical tools like educational robots to
introduce RL concepts to K-12 students in the real world.
To address this need, we designed a robot-based RL activity using LEGO
SPIKE Prime robot kit. To enrich the activity and provide students with an in-
tuitive way to visualize the RL training process and to interact with their robots,
we developed an Augmented Reality (AR) interface to bridge the virtual and
physical world. Our activity was designed using constructivist principles, aimed
at constructing students’ own understanding of RL by having them build and
control their own walking robots to go straight. They can also explore human-
in-the-loop training through our software and use the RL algorithm to train
the robot to complete additional tasks. Our activity covers the following aspects
of RL: 1) Key concepts in RL, including state, action, reward, and policy; 2)
Exploration and exploitation and how the agent chooses whether to explore or
exploit; 3) Q-table and agent’s decision making based on it; 4) Episodes and ter-
mination rules; and 5) Impacts of human input. By combining virtual interface
and physical robot, we aim to provide students with an interactive and engaging
learning journey and allow them to develop their own educational experience.
Currently, the target group of our research is middle and high school students.
We evaluate our platform in a study with 14 high school students using a
three-day curriculum, including a short session of playing with an online RL
learning platform we developed in 2020 [10]. Our results showed that students
were engaged and excited during the three classes, and they constructed a com-
prehensive understanding of both general and specific RL concepts.
2 Background and Related Work
2.1 Related Work
K-12 AI education has received increasing attention from institutions and re-
searchers, resulting in a number of approaches and methodologies [11, 12]. The
Association for the Advancement of Artificial Intelligence (AAAI) and the Com-
puter Science Teachers Association (CSTA) have collaborated with National Sci-
ence Foundation (NSF) and Carnegie Mellon University to formulate a guideline
for introducing big ideas of AI, like machine learning, to K-12 students[13]. Re-
cent AI education platforms include Google’s Teachable Machine [14], Machine
Learning for Kids, and MIT’s Cognimate [15]. These web-based AI education
platforms demonstrate AI-related concepts by implementing web interfaces for
students to train and test AI agents to finish different tasks.
LEGO educational robots and others have been studied in many K-12 ed-
ucation contexts including physics [16], mathematics [17] and engineering [18].
These studies have shown that robotics kits can improve the students’ engage-
ment and facilitate understanding of STEM concepts. LEGO robots have also
Introducing RL to K-12 Students with Robots and AR 3
been used in studies for teaching AI and robotics to different age group students
[19–22]. Particularly, a research in Spain used LEGO robots to teach RL to col-
lege level students [23]. These researchers have reported that LEGO robots could
make the learning experience more interactive, attractive, and friendly to stu-
dents who do not have an AI or robotics background, which provided inspiration
for our robot-based RL activity design.
At the same time, Augmented Reality (AR) has gradually become a popular
tool in K-12 STEM education [24]. Researchers have applied AR technique on
teaching robotics [22], physics [25], arts [26], etc. And results from these studies
showed that AR could not only increase students’ engagement, concentration,
and learning outcome, but also reduced the difficulty of learning. However, cur-
rently, AR is rarely used in K-12 AI education. Some researchers developed a
virtual reality platform that could introduce RL to K-12 students in a digital
world [9]. AR also contributed to many HRI research including robotic teleop-
eration [27], robot debugging [28], etc. Related work has proposed an AR visual
tool named “SENSAR” for human-robot collaboration [29], which was also de-
ployed on a LEGO EV3 robot for educational purposes. In our research, we want
to leverage the strength of AR and combine it with robots as a visual aid and
provide students with an intuitive way to collaborate with their robots.
2.2 Reinforcement Learning Background
Reinforcement learning is a class of problems where an agent has to learn how to
act based on scalar reward signals detected over the course of the interaction with
the environment. The agent’s world is represented as a Markov Decision Process
(MDP), a 5-tuple <S,A,T,R, γ >, where Sis a discrete set of states, Ais a set
of actions, T:S × A → Π(S) is a transition function that maps the probability
of moving to a new state given action and current state, R:S × A → Rgives
the reward of taking an action in a given state, and γ∈[0,1) is the discount
factor. We consider episodic tasks in which the agent starts in an initial state s0
and upon reaching a terminal state sterm, a new episode begins.
At each step, the agent observes its current state, and chooses an action
according to its policy π:S → A. The goal of an RL agent is to learn an
optimal policy π∗that maximizes the long-term expected sum of discounted
rewards. One way to learn the optimal policy is to learn the optimal action-
value function Q(s, a), which gives the expected sum of discounted rewards for
taking action ain state s, and following policy πafter:
Q∗(s, a) = R(s, a) + γX
s′
T(s′|s, a)×maxa′Q∗(s′, a′)
A commonly used algorithm used to learn the optimal action-value function
is Q-learning. In this algorithm, the Q-function is initialized arbitrarily (e.g., all
zeros). Upon performing action ain state s, observing reward Rand ending up
in state s′, the Q-function is update using the following rule:
Q(s, a)←Q(s, a) + α(R+γ maxa′Q(s′, a′)−Q(s, a))
4 Z. Zhang et al.
where α, the learning rate, is typically a small value (e.g., 0.05). The agent de-
cides which action to select using an ϵ-greedy policy, which means with small
probability ϵ, the agent chooses a random action (i.e., the agent explores); oth-
erwise, it chooses the action with the highest Q-value in its current state (i.e.,
the agent acts greedily with respect to its current action-value function).
To speed up the RL process, researchers have proposed human-in-the-loop
RL methods. For example, in the learning-from-demonstration (LfD) framework,
human teachers take over the action selection step, often providing several tra-
jectories of complete solutions before the agent starts learning autonomously. In
a related paradigm, the agent can seek “advice” from its human partner, e.g., let
humans provide a reward for the action it chooses in a particular state when the
Q-values are thought to be unreliable due to lack of experience. One of the goals
of our system and proposed activity is to demonstrate to students how human
partners can help a robot learn through interacting with it.
3 System Overview
We aim to introduce RL by combining physical robot and virtual interface. To
achieve that, we designed a robot-based activity to engage students and inspire
their curiosity to learn more about RL. To enrich the activity, we developed an
AR mobile application that can communicate with the robot. Students can use
the application to control the training of their robots and visualize the learning
process through the interface.
3.1 Robot-based Activity Design
To introduce educational robots to K-12 RL education, we looked for a robot-
based RL challenge that is: 1) Intuitive for students to understand; 2) Able to
achieve a good training result in less than 10 minutes; 3) Easy to test some
straightforward non-AI solutions, but usually won’t have a good result, whereas
can be solved quickly and intuitively by RL; 4) Possible for students to customize
robot design and explore solutions for their own robots. In order to meet these
goals, we used LEGO SPIKE Prime robot kit and designed a robot activity
called ”Smart Walk”. In this activity, students will be asked to build a walking
robot with LEGO bricks but without using wheels. An example build is shown
in Figure 1. They will then try to use LEGO SPIKE software to program the
robot to go a straight line with block-based coding or MicroPython. Due to the
uncertainty of the walking robot’s movements, it will be hard for students to
explicitly program the robot to go straight. So we will then let them train the
robot with a RL algorithm that is installed in advance. The training process is
straightforward, students only need to press the left button on the robot to train
the robot for an episode, and press the right button to test how well the robot
has been trained. The light matrix on the robot shows numbers to indicate how
many episodes the robot has been trained, and when students press the right
button, it will show that the robot is in test mode. After playing with the RL
Introducing RL to K-12 Students with Robots and AR 5
Fig. 1: An example build of a ”Smart Walk” robot
algorithm, we will ask students to make some modifications to the structure of
their robots (e.g, make one leg shorter than the other), then try the RL algorithm
and their own code again to compare the results. In this part, we would like to
show the adaptability of RL algorithm to the change in environment. At the end
of the activity, we will gather students around and let them share their findings,
questions, and thoughts about this RL challenge. Through the whole activity,
we want students to build up a general understanding of the process and some
strengths of RL, as well as inspire their curiosity to learn more about how RL
works.
We used a ϵ-greedy Q-learning algorithm to train the walking robot to go
straight. Based on the gyro sensor data, we defined five states (too left, a little
to the left, right in the middle, a little to the right, too right) to describe the
robot’s orientation. Corresponding to the states, we have five different actions
that control the speed of two motors on the robot. At each training step, the
robot would choose an action using a ϵ-greedy policy, then run itself for 0.5
seconds with the new speed. After coming to a stop, the robot would then read
data from its gyro sensor to confirm its new state. The algorithm implemented
would then assign a reward to the robot based on this data, with possible values
of -10, -2, or +10. The robot then employed the received reward to revise the Q-
value, allowing for the integration of this learning experience into future decision-
making processes.
3.2 AR Application Overview
We understand educational robots have limitations. For example, they usually
lack ways to help students visualize abstract information and concepts that are
important in understanding AI. Therefore, to enrich the robot-based activity
and demystify more specific RL concepts, we developed an AR application to
bridge the virtual and physical world using Unity4and Vuforia5. The application
can be easily deployed on Android or iOS devices. It communicates with LEGO
4https://unity.com/
5https://developer.vuforia.com/
6 Z. Zhang et al.
(a) Main Scene (b) ”Human Training” Page
(c) ”Training Result” Page (d) ”Challenges” Page
Fig. 2: Interface of the AR software
robots through Bluetooth Low Energy (BLE). To give students an immersive
experience, the user interface (UI) is designed with a science fiction theme.
The interface contains two main parts, the static UI part and the AR part.
As shown in Figure 2, the static UI contains 4 pages that students can navigate
through using a menu located at the bottom-left of the screen. The main scene
provides students with a straightforward way to train or test their robots. The
”Mode Switch” toggle allows students to switch between regular and a ”break-
down” mode, where each training step is divided into smaller, clearer stages so
that students can understand the robot’s decision-making and learning process
more visually. On the main scene. we also have a BLE connection indicator and
a button for students to restart the training. In addition to letting the robot
train itself, students can also choose to manually train the robot using the UI
components on the ”Human Training” page. In this part, students can reward
the robot with different values after it makes an action. They can also tweak
the ϵvalue to change the robot’s decision-making strategies during training.
With these functionalities, students have the opportunity to customize training
strategies for their own robots or attempt to teach the robot to finish more
tasks. On the ”Training Result” page, we provide an intuitive visual aid to
illustrate the current Q-value of each state-action pair, so that students can
comprehend the foundation of the robot’s decision-making process. Some extra
information related to the training, like total training steps, is also included on
Introducing RL to K-12 Students with Robots and AR 7
this page. To prevent students from feeling unsure of where to begin when they
start playing the software, we have added three tasks for students to undertake
on the ”Challenges” page. This page will automatically pop up when the software
is launched. These tasks will motivate students to explore all the functionalities
of the software. In the AR part of the UI, we focus on showing key RL concepts in
real-time, including state, action, reward, and training episodes. We also added
a 3D robot model that has a dialogue box on top of it, the virtual robot can
”chat” with students using the dialogue box to update task progress, or suggest
students try other functions like manual training. Since AR is activated, the
background of the application is the camera view, which makes it easier for
students to simultaneously keep track of the robot and the RL information. The
AR components are superimposed around an image target which will be handed
out to students before the activity.
4 Pilot Study and Results
This pilot study was conducted in a rural public high school located in New
Hampshire. The class consisted of fourteen students who were seniors enrolled
in a year-long engineering design course that focused on developing innovative
problem-solving skills. All of the students had one year of chemistry and physics
courses previously. Four of the students had a specific semester course in either
computer science or a DIY course focused on designing devices using the Arduino
and sensors platform. All of the students had a limited introduction to the LEGO
Education SPIKE Prime robot and its programming software. This pilot study
was conducted over three days in a single week in November 2022. The time spent
on this project included ninety minutes on Tuesday and Friday. Students also
had a fifty-minute class on Wednesday to play with the web-based RL learning
platform we developed in 2020.
4.1 Lesson Structure
The first session was designed around the robot-based activity we described in
the previous section. The goal of this session was to introduce students to the
AI and RL world, and to help them build a general understanding of RL by
comparing it to other regular approaches that students were more familiar with.
Specific RL concepts were not highlighted in this session to avoid overwhelming
the students and to spire their curiosity to learn more about RL. The students
worked in groups of two. To start with, students were asked to build the same
walking robot and then spent 20 minutes trying to program the robot to go
straight between two pieces of tape that were parallel to each other, using LEGO
SPIKE software. Next, the students spent 15 minutes trying to get the robot to
perform the same task using RL algorithm. After that, we gave students another
20 minutes to modify the structure of their robots to make them asymmetrical
and adapt their own code to the new design. Then they had another 15 minutes
to apply the RL algorithm to train the robot and compare the result to their own
8 Z. Zhang et al.
(a) GUI layout of the 1-D treasure hunting challenge
(b) GUI layout of the 2-D treasure hunting challenge
Fig. 3: GUI layout of the online platform
code. At the end of the session, we encouraged all the groups to share their robot
modifications, perform a test run to show their training results, and discuss their
first impressions and questions about the RL training process.
The second class was only 50 minutes and focused on the web platform we
developed before. The web platform contained two treasure-hunting-themed RL
challenges as shown in figure 3. In each challenge, students needed to train a
virtual robot to solve a maze by finding treasure and avoiding traps. The web
platform has the following features: 1) An interface for students to visualize
how key RL variables (e.g., state, action, reward, Q-value) change during the
training process; 2) Opportunities for students to participate in the training
process by providing rewards to the robot; 3) An interface for students to tweak
the ϵvalue to learn the concepts of exploration and exploitation in the RL
context. At the beginning of the second class, we gave the students a 5-minute
presentation that included the definition of AI, machine learning, and RL, as well
as their relationship. After that, we asked students to explore individually on the
Introducing RL to K-12 Students with Robots and AR 9
web platform. They were encouraged to discuss with each other while solving
the RL challenges. They started with the 1D maze-solve challenge (shown in
Figure 3(a)). The training was automatically accomplished by the embedded
RL algorithm. Our goal in this part was to let students get familiar with the
interface and establish an understanding of key RL concepts by observing their
change during the training process. Next, we let them explore a more complicated
2D maze environment (shown in Figure 3(b)). In this challenge, students could
choose both to train the robot automatically or to teach the robot manually
by providing a numerical reward after the robot made each move. In this part,
we hoped students to have a deeper understanding of the key RL concept and
could learn about how humans could be part of the learning process and more
efficiently facilitate the agent in identifying desirable actions. At the end of the
session, we let students share their takeaways.
In the third class, we introduced the AR application to students and asked
them to train their walking robots to reach a treasure chest in front of the robot.
The goal of this class was to enhance students’ learning of RL by letting them
implement their knowledge to solve a real-world problem. By giving students
little restriction and direct instruction, we hoped the students could proactively
explore the RL concepts that they were interested in or confused about, then
construct their own understanding of these concepts and disseminate it to other
students. The students were divided into the same groups as in session one. After
spending 10 minutes helping every group set up, we gave students 30 minutes
to play with the application and figure out how to finish the task. After that,
we encouraged students to redesign their robots and try to adjust their training
strategies for the new robots. After another 30 minutes, we asked students to
gather around to share their work, and discussed their takeaways and questions.
Since students were curious to know more about the AR technique, at the end
of the session, we had a 5-minute demo to show the students how to create an
AR experience using Unity and Vuforia.
4.2 Class Observation and Analysis
In the first session, we had a very structured lesson with all the students pro-
gressing at approximately the same pace. When trying to improve their robot’s
straight line movement between the two tape markers, students elected to change
the code by tweaking the speed of two motors. Although they all struggled with
the task as we predicted, various approaches were tested by the students in-
cluding moving the legs asynchronously, slowing down the overall speed, etc.
When first trying our RL algorithm, some students were confused when the
robot performed worse than the previous episode, and this was mainly caused
by the ϵ-greedy policy which made the robot sometimes choose random behavior
instead of always choosing the current optimal solution. But after 7 to 15 min-
utes of training, the RL algorithm succeeded on all the robots. After changing
the structure of their robots, most groups failed to modify their code to control
the new robot going straight. However, after several training episodes, the RL
algorithm successfully adapted to the new designs and achieved good training
10 Z. Zhang et al.
results. Students also spontaneously started over the training and compared the
result with previous training sessions. When the final test run happened, the
number of training episodes for all groups of robots ranged from 4 to 18. In the
debriefing discussion, several students noticed that more training episodes cor-
related with increased performance of finishing the task, and the robots also had
greater resistance to external disturbances (e.g, can turn back and go straight
again if someone pushed them in one direction.). A side observation was that
many students showed high curiosity to know more about intimate details of the
RL technique and programming. Student S6asked about how the robot learned,
and was further interested to know what type of reward was given and how the
robot interpreted the reward. Student N and student R were curious to know
how the robot decided what to do next. Overall, at the end of this session, stu-
dents were able to build a general understanding that RL is a loop of making
a choice, receiving feedback, and then adjusting its decision-making. They also
showed a high level of engagement and curiosity during the whole session.
In the second session, mostly, students were learning through their own ex-
ploration and discussion with others, while the instructors were observing and
answering their questions. The majority of the class was able to finish the 1D
maze training in 5-7 minutes. From our observation and their discussion, most
students could understand how the key RL variables changed during the training
process. As they moved to the exploration of the 2D maze, students started to
have more interactions with each other, and they used many RL concepts to
communicate. These included ”positive reinforcement”, ”exploration rate”, ”re-
ward”, etc. As the class progressed, students started to apply various strategies to
dig deeper into the RL challenge and concepts. Some students focused on train-
ing the robot automatically and observing how the robot performed differently
in each episode while some other students spent most of the time playing with
the manual training part. Students discussed different training strategies with
each other. Student L tried to replicate a specific case on student M’s computer
and they had a discussion about how it happened. Another student explained
how to use the Q-table visualization to evaluate current training result to his
neighbor. Students also came up with some insightful questions. For example,
student S asked us about a case where the map showed the robot was not visiting
all places “evenly”. He then used this observation to ask more specifically about
exploitation vs. exploration. Some students implemented interesting experiments
and built a deeper understanding of some RL concepts beyond our expectations.
A curious student J made some changes to the back end of the platform to test
a higher training speed and ϵvalue. Another student proposed that we could
set a high ϵvalue at the beginning and decrease it as the training progresses to
accelerate the training process, unknowingly explaining how a decayed ϵ-greedy
policy works. The variety of all these explorations and dialogues showed that
students were engaged, observant, challenged, and curious to figure out how RL
works.
6All the names are pseudonym names
Introducing RL to K-12 Students with Robots and AR 11
The third session was best characterized as the most creative. Within the first
half, most students were able to train their robots to straightly go to the treasure
chest using the software. The AR interface clearly excited the students. A few
students moved their iPads around and tried to see how the AR components
were superimposed in the real world, and we heard a few students comment on
how cool it was to their teammates. After the original task was finished, students
proactively pursued different directions to test all their ideas and to push the
limit of the algorithm. They generated different robot designs and implemented
various experiments on them. Student J’s group focused on testing how well the
RL algorithm could adapt to different robot designs. Two teams successfully
trained their robot manually to go a circle instead of going straight before test-
ing the same training policy on multiple robot designs. Student E proved her
assumption that giving a high ϵvalue at the beginning and gradually decreas-
ing it could help the robot learn faster. During this process, we noticed that
the robot served as a perfect medium for students to interact with each other.
When the robots performed some fascinating movements or achieved a goal,
it would engage students’ attention and generate discussion so that everyone’s
unique experience was shared around the whole classroom. For all the groups,
we found the AR application successfully kept them engaged and students can
visualize the ”whole picture” directly through the screen. Specifically, student E
expressed her preference for the manual training part since she thought it was
cool to impact the training with her own input. Also, in this session, students
discussed and asked more penetrating RL questions, which demonstrated their
deeper understanding of RL. One group discussed the difference between re-
warding the action itself and rewarding the result caused by the action. Another
group discussed on how to measure the robot’s capability of correcting itself
from the Q-table. Student K asked that after the robot had chosen an action
in a state, how did it interpret the reward it received to an evaluation of the
long-term gain from this action. At the end of the session, students were excited
to see how an AR experience was made and two students stayed late to ask more
in-depth questions about AR and RL.
4.3 Results and Limitations
Overall, students showed a high level of engagement and curiosity during all
three sessions. Though very little direct instruction was given, all the students
were able to construct an understanding of both general RL ideas and specific
RL concepts. In some aspects, like the concept of exploration and exploitation,
students could think further than we expected. Besides AI and RL, the students
also gained hands-on experience in robotics and AR.
The educational robot played an important role in engaging students and
motivating them to explore more about RL. Compared to session two which
was an activity based on a pure virtual platform, we observed higher solution
diversity in session one and session three. However, some students argued that
they ”learned more” in web-based session two than in robot-based session one.
So we believed that providing students with a straightforward approach to help
12 Z. Zhang et al.
them understand the training process and visualize the important AI concepts
was also necessary. The AR application successfully filled this gap by providing
an intuitive interface for students to collaborate with their robots. Students were
excited about the AR technique, and it helped demystify the RL process and
made it easy for students to monitor their robots and the RL training at the
same time.
There were also some limitations in this study. Due to the limited number
of students, we did not implement quantitative analysis to measure students’
learning outcomes more precisely. And because many of the students had prior
experiences with computer science and robotics, we predict a different outcome
if this three-session RL/AR curriculum is applied, unmodified to another group
of high school students. During session one, we noticed that the way we provided
students to train their robots with the RL algorithm could be further improved.
In session three, we found that the BLE connection between the robot and the
software was unstable at some times. Besides, more AR components and options
could be offered to the students to provide them with a more immersive learning
experience.
5 Conclusion and Future Work
In this paper, we presented a robot-based activity to introduce educational
robots to K-12 RL education. To compensate for the limitations of robots in
terms of information visualization and to enrich the robot-based activity, we
developed an AR software to provide an intuitive interface for students to fol-
low the training process and to facilitate their understanding of RL through
exploring human-in-the-loop training. By combining the virtual and the physi-
cal world, we aim to provide students with an engaging and interactive learning
journey and help them develop their own educational experience. A pilot study
was conducted with 14 high school students over a three-day period. The results
indicated that students were able to grasp both general and specific RL concepts
through our activity, and they showed a high level of engagement and curios-
ity during the classes. The AR part excited students and helped them easily
keep track of their robots during training. With the opportunity to design their
own robots and self-explore different training strategies, students constructed a
deeper understanding of some RL concepts than we anticipated.
Our AR implementation is still in its initial stages. Our goal for the future is
to improve the system by utilizing AR to directly track robots so we can add more
virtual components to enhance the training experience. For example, we can
create virtual obstacles and train the robots to avoid them, or show the robot’s
past trajectories in previous training episodes using AR. Additionally, we aim to
evaluate the platform’s effectiveness among younger students and measure their
learning progress. We also plan to host a workshop for K-12 STEM teachers,
providing them with the opportunity to implement the system and activities
with more students.
Introducing RL to K-12 Students with Robots and AR 13
References
1. B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al
Sallab, Senthil Yogamani, and Patrick P´erez. Deep reinforcement learning for
autonomous driving: A survey. IEEE Transactions on Intelligent Transportation
Systems, 23(6):4909–4926, 2022.
2. Iuliu Alexandru Zamfirache, Radu-Emil Precup, Raul-Cristian Roman, and
Emil M. Petriu. Policy iteration reinforcement learning-based control using a grey
wolf optimizer algorithm. Information Sciences, 585:162–175, 2022.
3. Zhenglei He, Kim-Phuc Tran, Sebastien Thomassey, Xianyi Zeng, Jie Xu, and
Changhai Yi. A deep reinforcement learning based multi-criteria decision support
system for optimizing textile chemical process. Computers in Industry, 125:103373,
2021.
4. Mohammad Aljanabi, Mohanad Ghazi, Ahmed Hussein Ali, Saad Abas Abed, and
ChatGpt. Chatgpt: Open possibilities. Iraqi Journal For Computer Science and
Mathematics, 4(1):62–64, Jan. 2023.
5. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.
A Bradford Book, Cambridge, MA, USA, 2018.
6. Henriikka Vartiainen, Matti Tedre, and Teemu Valtonen. Learning machine learn-
ing with very young children: Who is teaching whom? International Journal of
Child-Computer Interaction, page 100182, 06 2020.
7. Bawornsak Sakulkueakulsuk, Siyada Witoon, Potiwat Ngarmkajornwiwat, Porn-
pen Pataranutaporn, Werasak Surareungchai, Pat Pataranutaporn, and Pakpoom
Subsoontorn. Kids making AI: Integrating machine learning, gamification, and so-
cial context in stem education. In 2018 IEEE Intl. Conf. on Teaching, Assessment,
and Learning for Engineering (TALE), pages 1005–1010. IEEE, 2018.
8. Griffin Dietz, Jennifer King Chen, Jazbo Beason, Matthew Tarrow, Adriana
Hilliard, and R. Benjamin Shapiro. Artonomous: Introducing middle school stu-
dents to reinforcement learning through virtual robotics. IDC ’22, page 430–441,
New York, NY, USA, 2022. Association for Computing Machinery.
9. Youri Coppens, Eugenio Bargiacchi, and Ann Now´e. Reinforcement learning 101
with a virtual reality game. In Proceedings of the 1st International Workshop on
Education in Artificial Intelligence K-12 (August 2019).
10. Ziyi Zhang, Sara Willner-Giwerc, Jivko Sinapov, Jennifer Cross, and Chris Rogers.
An interactive robot platform for introducing reinforcement learning to k-12 stu-
dents. In Robotics in Education, pages 288–301, Cham, 2021. Springer International
Publishing.
11. Jiahong Su, Yuchun Zhong, and Davy Tsz Kit Ng. A meta-review of literature on
educational approaches for teaching ai at the k-12 levels in the asia-pacific region.
Computers and Education: Artificial Intelligence, 3:100065, 2022.
12. Henriikka Vartiainen, Matti Tedre, and Teemu Valtonen. Learning machine learn-
ing with very young children: Who is teaching whom? International Journal of
Child-Computer Interaction, 25:100182, 2020.
13. David Touretzky, Christina Gardner-McCune, Fred Martin, and Deborah Seehorn.
Envisioning ai for k-12: What should every child know about ai? Proceedings of
the AAAI Conference on Artificial Intelligence, 33(01):9795–9799, Jul. 2019.
14. Michelle Carney, Barron Webster, Irene Alvarado, Kyle Phillips, Noura Howell,
Jordan Griffith, Jonas Jongejan, Amit Pitaru, and Alexander Chen. Teachable
machine: Approachable web-based tool for exploring machine learning classifica-
tion. In Extended Abstracts of the 2020 CHI Conference on Human Factors in
14 Z. Zhang et al.
Computing Systems, CHI EA ’20, page 1–8, New York, NY, USA, 2020. Associa-
tion for Computing Machinery.
15. Stefania Druga. Growing up with AI. Cognimates: from coding to teaching ma-
chines. PhD thesis, Massachusetts Institute of Technology, 2018.
16. Pavel Petroviˇc. Spike up prime interest in physics. In Robotics in Education, pages
146–160. Springer International Publishing, 2021.
17. Sonia Mandin, Marina De Simone, and Sophie Soury-Lavergne. Robot moves as
tangible feedback in a mathematical game at primary school. In Robotics in Edu-
cation, Advances in Intelligent Systems and Computing, pages 245–257. Springer
International Publishing, Cham, 2016.
18. Jeffrey Laut, Vikram Kapila, and Magued Iskander. Exposing middle school stu-
dents to robotics and engineering through LEGO and matlab. In 120th ASEE
Annual Conference and Exposition, 2013. 120th ASEE Annual Conference and
Exposition ; Conference date: 23-06-2013 Through 26-06-2013.
19. Lawrence Whitman and Tonya Witherspoon. Using LEGOs to interest high school
students and improve K-12 stem education. Change, 2, 12 2003.
20. Randi Williams, Hae Won Park, and Cynthia Breazeal. A is for artificial intelli-
gence: The impact of artificial intelligence activities on young children’s perceptions
of robots. In CHI ’19: Proceedings of the 2019 CHI Conference on Human Factors
in Computing Systems, pages 1–11, 04 2019.
21. Bram van der Vlist, Rick van de Westelaken, Christoph Bartneck, Jun Hu, Rene
Ahn, Emilia Barakova, Frank Delbressine, and Loe Feijs. Teaching machine learn-
ing to design students. In Zhigeng Pan, Xiaopeng Zhang, Abdennour El Rhalibi,
Woontack Woo, and Yi Li, editors, Technologies for E-Learning and Digital En-
tertainment, pages 206–217, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
22. Mark Cheli, Jivko Sinapov, Ethan E Danahy, and Chris Rogers. Towards an
augmented reality framework for K-12 robotics education. In Proceedings of the
1st International Workshop on Virtual, Augmented, and Mixed Reality for HRI
(VAM-HRI), 2018.
23. ´
Angel Mart´ınez-Tenor, Ana Cruz-Mart´ın, and Juan-Antonio Fern´andez-Madrigal.
Teaching machine learning in robotics interactively: the case of reinforcement learn-
ing with lego®mindstorms. Interactive Learning Environments, 27(3):293–306,
2019.
24. Mustafa Sırakaya and Didem Alsancak Sırakaya. Augmented reality in stem edu-
cation: a systematic review. Interactive Learning Environments, 30(8):1556–1569,
2022.
25. Somsak Techakosit and Prachyanun Nilsook. Using augmented reality for teaching
physics. 07 2015.
26. Yujia Huang, Hui Li, and Ricci Fong. Using augmented reality in early art educa-
tion: a case study in hong kong kindergarten. Early Child Development and Care,
186(6):879–894, 2016.
27. Donghyeon Lee and Young Soo Park. Implementation of augmented teleoperation
system based on robot operating system (ros). In 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), pages 5497–5502, 2018.
28. Bryce Ikeda and Daniel Szafir. An AR debugging tool for robotics programmers.
In International Workshop on Virtual, Augmented, and Mixed-Reality for Human-
Robot Interaction (VAM-HRI), 2021.
29. Andre Cleaver, Muhammad Faizan, Hassan Amel, Elaine Short, and Jivko Sinapov.
SENSAR: A visual tool for intelligent robots for collaborative human-robot inter-
action. CoRR, abs/2011.04515, 2020.