Conference PaperPDF Available

Adaptive Immediate Feedback Can Improve Novice Programming Engagement and Intention to Persist in Computer Science

Adaptive Immediate Feedback Can Improve Novice
Programming Engagement and Intention to Persist in Computer
Samiha Marwan
North Carolina State University
Ge Gao
North Carolina State University
Susan Fisk
Kent State University
Thomas W. Price
North Carolina State University
Tiany Barnes
North Carolina State University
Prior work suggests that novice programmers are greatly impacted
by the feedback provided by their programming environments.
While some research has examined the impact of feedback on stu-
dent learning in programming, there is no work (to our knowledge)
that examines the impact of adaptive immediate feedback within
programming environments on students’ desire to persist in com-
puter science (CS). In this paper, we integrate an adaptive immediate
feedback (AIF) system into a block-based programming environ-
ment. Our AIF system is novel because it provides personalized
positive and corrective feedback to students in real time as they
work. In a controlled pilot study with novice high-school program-
mers, we show that our AIF system signicantly increased students’
intentions to persist in CS, and that students using AIF had greater
engagement (as measured by their lower idle time) compared to
students in the control condition. Further, we found evidence that
the AIF system may improve student learning, as measured by stu-
dent performance in a subsequent task without AIF. In interviews,
students found the system fun and helpful, and reported feeling
more focused and engaged. We hope this paper spurs more research
on adaptive immediate feedback and the impact of programming
environments on students’ intentions to persist in CS.
Programming environments, Positive feedback, Adaptive feedback,
Persistence in CS, Engagement
ACM Reference Format:
Samiha Marwan, Ge Gao, Susan Fisk, Thomas W. Price, and Tiany Barnes.
2020. Adaptive Immediate Feedback Can Improve Novice Programming
Engagement and Intention to Persist in Computer Science. In Proceedings of
the 2020 International Computing Education Research Conference (ICER ’20),
August 10–12, 2020, Virtual Event, New Zealand. ACM, New York, NY, USA,
10 pages.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from
ICER ’20, August 10–12, 2020, Virtual Event, New Zealand
©2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-7092-9/20/08. . . $15.00
Eective feedback is an essential element of student learning [
] and motivation [
], especially in the domain of programming
]. When programming, students primarily receive feed-
back from their programming environment (e.g., compiler error
messages). Prior work has primarily focused on how such feedback
can be used to improve students’ cognitive outcomes, such as per-
formance or learning [
]. However, less work has explored
how such feedback can improve students’ aective outcomes, such
as engagement and intention to persist in computer science (CS).
These outcomes are especially important because we are facing a
shortage of people with computational knowledge and program-
ming skills [
], which will not be addressed–no matter how much
students learn about computing in introductory courses–unless
more students choose to pursue computing education and careers.
It is also important to study feedback in programming environ-
ments because prior work shows that it can sometimes be frustrat-
ing, confusing, and dicult to interpret [
]. In particular,
there is a need for further research on how programming feed-
back can be designed to create positive, motivating, and engaging
programming experiences for novices, while still promoting perfor-
mance and learning. Creating these positive experiences (including
enjoyment and feelings of ability) are particularly important be-
cause they have a profound impact on students’ intention to persist
in computing [35].
In this paper, we explore the eects of a novel adaptive imme-
diate feedback (AIF) system on novice programming students. We
designed the AIF system to augment a block-based programming
environment with feedback aligned with Scheeler et al’s guidance
that feedback should be immediate, specic, positive, and corrective
]. Thus, our AIF provides real-time feedback adapted to each
individual student’s accomplishments on their performance on a
specic open-ended programming task. Since our AIF system is
built on data from previous student solutions to the same task, it al-
lows students to approach problem solving in their own way. Given
the benecial impact of feedback on learning [
], we hypothesize
that our AIF system will improve student performance and learning.
We also hypothesize that our AIF system will improve the coding
experience of novice programmers, making it more likely that they
will want to persist in CS. This is especially important given the
aforementioned dearth of workers with computing skills, and the
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
fact that many students with sucient CS ability choose not to
major in CS [31].
We performed a controlled pilot study with 25 high school stu-
dents, during 2 summer camps, to investigate our primary research
question: What impact does adaptive immediate feedback (AIF)
have on students’ perceptions, engagement, performance, learning,
and intentions to persist in CS? In interviews, students found AIF
features to be engaging, stating that it was fun, encouraging, and
motivating. Our quantitative results show that in comparison to
the control group, our AIF system increased students’ intentions
to persist in CS, and that students who received the AIF were sig-
nicantly more engaged with the programming environment, as
measured by reduced idle time during programming. Our results
also suggest that the AIF system improved student performance by
reducing idle time, and that the AIF system may increase novice
students’ learning, as measured by AIF students’ performance in a
future task with no AIF.
In sum, the key empirical
of this work are: (1)
a novel adaptive immediate feedback system, and (2) a controlled
study that suggests that programming environments with adaptive
immediate feedback can increase student engagement and intention
to persist in CS. Since the AIF is built based on auto-grader technolo-
gies, we believe that our results can generalize to novices learning
in other programming languages and environments. This research
also furthers scholarship on computing education by drawing at-
tention to how programming environments can impact persistence
in computing.
Researchers and practitioners have developed myriad forms of au-
tomated programming feedback. Compilers oer the most basic
form of syntactic feedback through error messages, which can
be eectively enhanced with clearer, more specic content [
Additionally, most programming practice environments, such as
CodeWorkout [
] and Cloudcoder [
], oer feedback by running
a student’s code through a series of test cases, which can either pass
or fail. Other autograders (e.g. [
]) use static analysis to oer
feedback in block-based languages, which do not use compilers.
Researchers have improved on this basic correctness feedback with
adaptive on-demand help features, such as misconception-driven
feedback [
], expert-authored immediate feedback [
], and auto-
mated hints [
], which can help students identify errors and
misconceptions or suggest next steps. This additional feedback can
improve students’ performance and learning [
]. However,
despite these positive results, Aleven et al. note that existing auto-
mated feedback helps “only so much” [
]. In this section, we explore
how programming feedback could be improved by incorporating
best practices, and why this can lead to improvements in not only
cognitive but also aective outcomes.
Good feedback is critically important for students’ cognitive
and aective outcomes [
] – but what makes feedback
good? Our work focuses on task-level, formative feedback, given
to students as they work, to help them learn and improve. In a
review of formative feedback, Shute argues that eective feedback
is non-evaluative, supportive, timely, and specic [
]. Similarly, in
a review on eective feedback characteristics, Scheeler et al. noted
that feedback should be immediate, specic, positive, and corrective
to promote lasting change in learners’ behaviors [
]. Both reviews
emphasize that feedback, whether from an instructor or a system,
should support the learner and provide timely feedback according
to students’ needs. However, existing programming feedback often
fails to address these criteria.
: Cognitive theory suggests that immediate feedback
is benecial, as it results in more ecient retention of information
]. While there has been much debate over the merits of imme-
diate versus delayed feedback, immediate feedback is often more
eective on complex tasks, when students have less prior knowl-
edge [
], making it appropriate for novice programmers. In most
programming practice environments, however, students are ex-
pected to work without feedback until they can submit a mostly
complete (if not correct) draft of their code. They then receive de-
layed feedback, from the compiler, autograder or test cases. Even
if students choose to submit code before nishing it, test cases
are not generally designed for evaluating partial solutions, as they
represent correct behavior, rather than subgoals for the overall
task. Other forms of feedback, such as hints, are more commonly
oered on-demand [
]. While these can be used for immediate
feedback, this requires the student to recognize and act on their
need for help, and repeated studies have shown that novice pro-
grammers struggle to do this [
]. In contrast, many studies
with eective programming feedback have used more immediate
feedback. For example, Gusukuma et al. evaluated their immediate,
misconception-driven feedback in a controlled study with under-
graduate students, and found it improved students performance on
open-ended programming problems [26].
: Positive feedback is an important way that people
learn, through conrmation that a problem-solving step has been
achieved appropriately, e.g. “Good move” [
]. Mitrovic et
al. theorized that positive feedback is particularly helpful for novice
programmers, as it reduces their uncertainty about their actions
], and this interpretation is supported by cognitive theories as
well [
]. However, in programming, the feedback students re-
ceive is rarely positive. For example, enhanced compiler messages
] improve on how critical information is presented, but oer no
additional feedback when students’ code compiles correctly. Simi-
larly automated hints [
] and misconception-driven feedback [
highlight what is wrong, rather than what is correct about students’
code. This makes it dicult for novices to know when they have
made progress, which can result in students deleting correct code,
unsure if it is correct [
]. Two empirical studies in programming
support the importance of positive feedback. Fossati et al. found
that the iList tutor with positive feedback improved learning, and
students liked it more than iList without it [
]. An evaluation of
SQL tutor showed that positive feedback helped students master
skills in less time [43].
While there is a large body of work exploring factors inuencing
students’ aective outcomes, like retention [
] or intentions to
persist in CS major [6], few studies have explored how automated
tools can improve these outcomes [
]. There is ample evidence that
immediate, positive feedback would be useful to students; however,
few feedback systems oer either [
]. More importantly, evalua-
tions of these systems have been limited only to cognitive outcomes;
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
however, prior work suggests that feedback should also impact stu-
dents’ aective outcomes such as their engagement and intention
to persist. For example, a review of the impact of feedback on per-
sistence nds that positive feedback “increases motivation when
people infer they have greater ability to pursue the goal or associate
the positive experience with increased goal value” [
]. Positive
feedback has also been found to improve student condence, and is
an eective motivational strategy used by human tutors [
]. Ad-
ditionally, in other domains feedback has been shown to increase
students’ engagement [
]. This suggests not only the need for
the design of feedback that embraces these best practices, but also
evaluation of its impact on cognitive and aective outcomes.
Our adaptive immediate feedback (AIF) system was designed to pro-
vide high-quality feedback to students as they are learning to code
in open-ended programming tasks (e.g. PolygonMaker and Daisy-
Design, described in more detail in Section 4.2). Our AIF system
continuously and adaptively conrms when students complete (or
break) meaningful objectives that comprise a larger programming
task. Importantly, a student can complete AIF objectives without
having fully functional or complete code. This allows us to oer
positive and corrective feedback that is immediate and specic. In
addition, our AIF system includes pop-up messages tailored to our
student population, since personalization is key in eective human
tutoring dialogs [
] and has been shown to improve novices’
learning [33, 44].
Our AIF system consists of three main components to achieve
real-time adaptive feedback: objective detectors, a progress panel,
and pop-up messages. The objective detectors are a set of continuous
autograders focused on positive feedback, that continuously check
student code in real time to determine which objective students
are working on and whether they have correctly achieved it or not.
The progress panel is updated by the continuous objective detectors
to color each task objective according to whether they are complete
(green), not started (grey), or broken (red), since prior research
suggests that students who were uncertain often delete their correct
code [
]. The pop-up messages leverage the objective detectors
to provide achievement pop-ups when objectives are completed,
and motivational pop-ups when a student has not achieved any
objectives within the last few minutes. These pop-ups promote
condence by praising both accomplishment and perseverance,
which may increase students’ persistence [
]. We strove to make
the AIF system engaging and joyful, which may increase students’
motivation and persistence [30].
To develop our AIF system, we rst developed task-specic ob-
jective detectors, which can be thought of as continuous real-time
autograders, for each programming task used in our study. Unlike
common autograders which are based on instructors’ test cases,
our objective detectors are hand-authored to encompass a large
variety of previous students’ correct solutions matching various
students’ mindsets. To do so, two researchers with extensive expe-
rience in grading block-based programs, including one instructor,
divided each task into a set of 4-5 objectives that described features
of a correct solution, similar to the process used by Zhi et al. [
Figure 1: Adaptive Immediate Feedback (AIF) system with
pop-up message (top) and progress dialog (bottom right),
added to iSnap [52].
Table 1: Examples of Pop-up Messages in the AIF System.
State message
<1/2objectives complete - You are legit amazing!!
>1/2objectives complete - You’re on re!
All objectives are done - High FIVE!! , you DID IT!!
Fixed broken objective - Yay!!!! it’s xed !!
Struggle/Idle , <half done - Yeat it till you beat it!!
Struggle/Idle, >half done
- You are doing great so far!!
Then, for each task, we transferred prior students’ solutions into
abstract syntax trees (AST). Using these ASTs, we detected dierent
patterns that resemble a complete correct objective, and accord-
ingly, we developed objective detectors to detect the completion for
each objective. Finally, we tested and enhanced the accuracy of the
objective detectors by manually verifying their performance and
rening them until they correctly identied objective completion
on 100 programs written by prior students.
Based on the objective detectors, we designed the progress panel
to show the list of objectives with colored progress indicators, as
shown in the bottom right of Figure 1. Initially, all the objectives
are deactivated and grey. Then, while students are programming,
the progress panel adaptively changes its objectives’ colors based
on students’ progress detected by our objective detectors. Once
an objective is completed, it becomes green, but if it is broken, it
changes to red.
We then designed personalized AIF pop-up messages. We asked
several high school students to collaboratively construct messages
for a friend to (1) praise achievement upon objective completion, or
(2) provide motivation when they are struggling or lose progress.
The nal messages shown in Table 1 include emojis added to in-
crease positive aect [
]. In real time, our AIF system selects a
contextualized pop-up message based on students’ code and actions,
detected by our objective detectors. The pop-up messages provided
immediate adaptive feedback, such as “Woo, one more to go!!” for
a student with just one objective left, or “Good job, you FIXED it!!
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
;)” when a student corrected a broken objective. To praise persever-
ance, AIF pop-ups are also shown based on time, either after some
idle time, or if a student takes longer than usual on an objective
based on previous student data. It may be especially important to
provide aective support to students who may be struggling. For
example, if a student stopped editing for more than 2 minutes
and they are half-way through the task, one motivational pop-up
message is “Keep up the great work !!”.
Our novel AIF system is the rst such system to include contin-
uous real-time autograding through objective detectors, and is the
rst to use such detectors to show students a progress panel for
task completion in open-ended programming tasks. Further, our
AIF pop-up message system is the rst such system that provides
both immediate, achievement-based feedback as well as adaptive
encouragement for students to persist. The AIF system was added
to iSnap, a block-based programming environment [
], but could
be developed based on autograders for most programming environ-
3.1 Illustrative Example: Jo’s Experience with
the AIF System
To illustrate a student’s experience with the AIF system and make
its features more concrete, we describe the observed experience
of Jo, a high school student who participated in our study, as de-
scribed in detail in Section 4. On their rst task to create a program
to draw a polygon (PoygonMaker), Jo spent 11 minutes, requested
help from the teacher, and received 3 motivational and 2 achieve-
ment pop-up messages. As is common with novices new to the
block-based programming environment, Jo initially spent 3 min-
utes interacting with irrelevant blocks. Jo then received a positive
pop-up message, “Yeet it till you beat it ”. Over the next few
minutes, Jo added 3 correct blocks and received a few similar en-
couraging pop-up messages. Jo achieved the rst objective, where
AIF marked it as complete and showed the achievement pop-up
“You are on re! .” Jo was clearly engaged, achieving 2 more
objectives in the next minute. Over the next 3 minutes, Jo seemed
to be confused or lost, repetitively running the code with dierent
inputs. After receiving the motivational pop-up “You’re killing it!
,” Jo reacted out loud by saying “that’s cool”. One minute
later, Jo completed the 4th objective and echoed the pop-up “Your
skills are outta this world !!, you DID it!! ,”’ saying, “Yay, I did
it.” Jo’s positive reactions, especially to the pop-up messages, and
repeated re-engagement with the task, indicate that AIF helped this
student stay engaged and motivated. This evidence of engagement
aligns with prior work that measures students’ engagement with
a programming interface by collecting learners’ emotions during
programming [38].
Jo’s next task was to draw a DaisyDesign, which is more com-
plex, and our adaptive immediate feedback seemed to help Jo over-
come diculty and maintain focus. AIF helped Jo stay on task by
providing a motivational pop-up after 3 minutes of unproductive
work. In the next minute, Jo completed the rst two objectives. AIF
updated Jo’s progress and gave an achievement pop-up, “You’re
We used a threshold of 2 minutes based on instructors’ feedback on students’ pro-
gramming behavior.
the G.O.A.T
”. Jo then did another edit, and AIF marked a
previously-completed objective in red – demonstrating its imme-
diate corrective feedback. Jo immediately asked for help and xed
the broken objective. After receiving the achievement pop-up “Yeet,
gottem!! ,” Jo echoed it out loud. Jo spent the next 13 minutes
working on the 3rd objective with help from the teacher and peers,
and 3 motivational AIF pop-ups. While working on the 4th and
nal objective over the next 5 minutes, Jo broke the other three
objectives many times, but noticed the progress panel and restored
them immediately. Finally, Jo nished the DaisyDesign task, say-
ing “the pop-up messages are the best.” This example from a real
student’s experience illustrates how we accomplished our goals to
improve engagement (e.g. maintaining focus on important objec-
tives), student perceptions (e.g. stating “that’s cool”), performance
(e.g. understanding when objectives were completed), and program-
ming behaviors (e.g. correcting broken objectives).
We conducted a controlled, pilot study during two introductory CS
summer camps for high school students. Our primary
is: What impact does our adaptive immediate feedback
(AIF) system have on novice programmers? Specically, we hy-
pothesized that the AIF system would be positively perceived by
students (H1-qual) and that the AIF system would increase: stu-
dents’ intentions to persist in CS (H2-persist), student’ engagement
(H3-idle), programming performance (H4-perf), and learning (H5-
learning). We investigated these hypotheses using data gleaned
from interviews, system logs, and surveys.
4.1 Participants
Participants were recruited from two introductory CS summer
camps for high school students. This constituted an ideal population
for our study, as these students had little to no prior programming
experience or CS courses, allowing us to test the impact of the AIF
system on students who were still learning the fundamentals of
coding and who had not yet chosen a college major. Both camps
took place on the same day at a research university in the United
We combined the camp populations for analysis, since the camps
used the same curriculum for the same age range. Study procedures
were identical across camps. The two camps consisted of one camp
with 14 participants, (7 female, 6 male, and 1 who preferred not to
specify their gender) and an all-female camp with 12 participants.
Across camps, the mean age was 14, and 14 students identied as
White, 7 as Black or African American, 1 as Native American or
American Indian, 2 as Asian, and 1 as Other. None of the students
had completed any prior, formal CS classes. Our analyses only
include data from the twenty-ve participants who assented–and
whose parents consented–to this IRB-approved study.
4.2 Procedure
We used an experimental, controlled pre-post study design, wherein
we randomly assigned 12 students to the experimental group
(who used the AIF system), and 13 to the
group (who used
2G.O.A.T., an expression suggested by teenagers, stands for Greatest of All Time.
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
the block-based programming environment without AIF). The pre-
post measures include a survey on their attitudes towards their
intentions to persist in CS and a multiple choice test to assess basic
programming knowledge. The teacher was unaliated with this
study and did not know any of the hypotheses or study details,
including condition assignments for students. The teacher led an
introduction to block-based programming, and explained user input,
drawing, and loops. Next, all students took the pre-survey and
In the experimental phase of the study, students were asked to
complete 2 consecutive programming tasks (1: PolygonMaker and
2: DaisyDesign)
. Task1, PolygonMaker, asks students to draw any
polygon given its number of sides from the user. Task2, DaisyDesign,
asks students to draw a geometric design called “Daisy” which is
a sequence of overlapping
circles, where
is a user input. Both
tasks required drawing shapes, using loops, and asking users to
enter parameters, but the DaisyDesign task was more challenging.
Each task consisted of 4 objectives, for a total of 8 objectives that
a student could complete in the experimental phase of the study.
Students in the experimental group completed these tasks with the
AIF system, while students in the control group completed the same
task without the AIF system. All students, in both the experimental
and control groups, were allowed to request up to 5 hints from the
iSnap system [
], and to ask for help from the teacher. In sum,
there were eight objectives (as described in Section 3) that students
could complete in this phase of the experiment (the experimental
phase). We measured a student’s programming performance based
on their ability to complete these eight objectives (see Section 4.3,
below for more details).
After each student reported completing both tasks
, teachers
directed students to take the post-survey and post-test. Two re-
searchers then conducted semi-structured 3-4 minute interviews
with each student. During the interviews with
students, re-
searchers showed students each AIF feature and asked what made
it more or less helpful, and whether they trusted it. In addition, the
researchers asked students’ opinions on the AIF design and how it
could be improved.
Finally, all students were given 45 minutes to do a third, similar,
but much more challenging, programming task (DrawFence) with
5 objectives without access to hints or AIF. Learning was measured
based on a student’s ability to complete these ve objectives (see
subsection 4.3, below for more details).
4.3 Measures
Pretest ability
- Initial computing ability was measured using an
adapted version of Weintrop, et al.’s commutative assessment [
with 7 multiple-choice questions asking students to predict the
outputs for several short programs. Across both conditions, the
mean pre-test score was 4.44 (SD = 2.39; min = 0; max = 7).
- Engagement was measured by using the percent
of programming time that students’ spent idle (i.e. not engaged) on
tasks 1 and 2. While surveys are often used to measure learners’
engagement, these self-report measures are not always accurate,
Programming tasks’ instructions are available at:
While all students reported completing both tasks, we found some students did not
nish the tasks after looking at their log data.
and our ne-grained programming logs give us more detailed in-
sight into the exact time students were, and were not, engaged with
programming. To calculate percent idle time, we dened idle time
as a period of 3 or more minutes
that a student spent without mak-
ing edits or interacting with the programming environment, and
divided this time by the total time a student spent programming. A
student’s total programming time was measured from when they
began programming to task completion, or the end of the program-
ming session if it was incomplete. While we acknowledge that some
“idle” time may have been spent productively (e.g., by discussing the
assignment with the teacher or peers), we observed this very rarely
(despite Jo’s frequent help from friends and the teacher). Across
both conditions, the mean percent idle time on tasks 1 and 2 was
13.9% (Mean = 0.139; SD = 0.188; min = 0; max = 0.625).
Programming Performance
- Programming performance was
measured by objective completion during the experimental phase
of the study (i.e., the rst eight objectives in which the experimental
group used the AIF system). Each observed instance of program-
ming performance was binary (e.g., the object was completed [value
of ‘1’] or the object was not completed [value of ‘0’]). This led to a
repeated measures design, in which observations (i.e., whether an
object was or was not completed) were nested within participants,
as each participant attempted 8 objectives (in sum, 200 observa-
tions were nested within 25 participants). We explain our analytical
approach in more detail below. Across both conditions, the mean
programming performance was 0.870 (min = 0 ; max = 1).
- Learning was measured by objective completion
during the last phase (the learning phase) of the study, on the last
ve objectives in which neither group used the AIF system, because
we assumed that students who learned more would perform better
on this DrawFence task. Each observed instance of programming
performance was binary. This led to a repeated measures design,
in which observations (e.g., whether an objective was or was not
completed) were nested within participants, as each participant
attempted 5 objectives. In sum, 125 observations were nested within
25 participants. Across both conditions, the mean learning score
was 0.60 (min = 0; max = 1), meaning that, on average, students
completed about 3 of the 5 objectives.
Intention to Persist
- CS persistence intentions were measured
using the pre- and post-surveys using 7-point Likert scales adapted
from a survey by Correll et al. [
]. Students were asked to state how
likely they were to: 1) take a programming course in the future,
2) Minor in CS, 3) Major in CS, 4) Apply to graduate programs
in CS, and 5) Apply for high-paying jobs requiring high levels of
Computer Science ability. We added these measures together for a
CS persistence index with a high alpha (a = 0.85)
, indicative of the
scale’s reliability, with a mean of 24.8 (SD = 5.15; min = 13; max =
While we could not measure actual persistence in CS, intentions
to persist are a good proxy, given that research nds that they are
predictive of actual persistence in STEM elds and “...hundreds
of research eorts occurring [since the late 1960s] support the
We choose this 3-minute cuto based on our analysis of prior student programming
log data on the same tasks.
Cronbach’s alpha is a measure of the reliability, or internal consistency, of a scale
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
contention that intention is the ‘best’ predictor of future behavior”
4.4 Analytical Approach
Our quantitative analytical approach was informed by our small
sample size (caused by the low number of students available for
study recruitment) in two ways. First, to control for pre-existing
dierences between students, we control for a student’s pretest
performance in all of our models, as random assignment in a small
sample may not be enough to ensure roughly equal levels of abil-
ity in both groups. Second, we use linear mixed eects models to
maximize our statistical power when we have repeated measures
(for instance, we treat each programming performance item [ob-
jective complete or incomplete] as our unit of analyses of student
performance and learning). These models, “...are an extension of
simple linear models to allow both xed and random eects, and are
particularly used when there is non independence in the data, such
as arises from a hierarchical structure” [
]. This is appropriate
to use because our data have a nested structure, as observations
(e.g., CS persistence intentions, objective completion) are nested
within students. Thus, observations are not independent, given
that each participant contributed numerous observations. A mixed
model allows us to account for the lack of independence between
observations while still taking advantage of the statistical power
provided by having repeated measures. It also allows us to estimate
a random eect for each student, meaning that the model more
eectively controls for idiosyncratic participant dierences, such as
dierences in incoming programming ability between participants.
We use a specic type of linear mixed eects model, a linear
probability model (LPM) with mixed eects7to predict the binary
outcomes of programming performance and learning. While logistic
models are typically used to predict binary outcomes, we used a
LPM with mixed eects because the interpretation of coecients is
more intuitive [
]. This has led many researchers to suggest using
LPMs [
], especially because they are typically as good as logistic
models at predicting dichotomous variables, and their p-values are
highly correlated [28].
We rst use interviews to investigate (H1-qual), that the AIF system
would be positively perceived by students, and then use the student
surveys to determine the impact of the AIF system on intentions
to persist in CS (H2-persist). Next, we analyze student log les to
investigate the impact of the AIF system on student engagement
(H3-idle), programming performance (H4-perf), and learning (H5-
5.1 H1-qual: AIF Perceptions
To investigate student perceptions of the AIF system, we transcribed
interviews of all 12 students in the
group. Afterwards, we
followed a 6-step thematic analysis, described in [
], to identify
positive and negative themes for the AIF pop-ups and progress panel
features. Two of the present authors start by (1) getting familiar
with the data, then (2) generate initial codes (for each AIF feature),
and then start (3) looking for dominating themes. Afterwards, one
7A LPM is simply a linear model used to predict a binary outcome.
author (4) reviews the themes, and (5) renes them to focus mainly
on the positive and negative aspects of each AIF feature. We then (6)
combine these ndings in the following summary. The main theme
for each feature is whether it is helpful or not. We select quotes
that represent typical positive and negative helpfulness comments
from student participants labeled with s1-s12.
Pop-up messages
: When asked what made pop-up messages
helpful or less helpful, 10 out of 12 students agreed that pop-up
messages were helpful and elaborated on why. Most students found
them “engaging” [s2], “funny” [s1, s2], “encouraging” [s7], and
“motivating” [s1, s5, s8]. For example, s1 stated, “it is better than just
saying ‘correct’ because it gets you more into it,” and s5 said, “it kept
me focused and motivated, I was like ‘yea I did it’.” One student,
s7, noted that messages encouraged perseverance by removing
uncertainty: “especially when you don’t know what you were doing,
it tells that you are doing it, so it let you continue and keep going.
Two students noted that pop-ups were helpful by “keeping me
on track” [s3, s4]. When asked what made pop-up messages less
helpful, two students stated that “they were not really helpful” [s4,
s3], and one student s7 suggested “a toggle to [turn them] on or o”.
Overall, the majority of students (83%) found the pop-ups helpful
and engaging.
Progress Panel
: When we asked students “what about the
progress panel makes it helpful or less helpful?”, all students found
it helpful and requested to have it in future tasks. Students said the
progress panel not only helped them to keep track of their progress,
but also motivated them:“it told you what you completed so it kinda
gives me motivation” [s8]. In addition, students appreciated the
change in colors of each objective “because you can see how much
you have to do, or if you took o something [a block] and you
thought it was wrong and it turns red then that means you were
actually right” [s7]. When students were asked what about the
progress panel makes it less helpful, one student incorrectly noted,
“I think there was [only] one way to complete these objectives” [s6].
Students overwhelmingly found the AIF progress panel to be bene-
cial, especially in understanding when an objective was complete
or broken, supporting Scheeler’s suggestion that feedback should
be specic, immediate and positive or corrective.
This thematic analysis
supports H1-qual
, that students would
positively perceive the AIF system. In addition, these results pro-
vide insights on how to design personalized adaptive, immediate
feedback that can engage and motivate novice high school students
in their rst programming experiences.
5.2 H2-persist: Intentions to Persist
To investigate H2-persist, we analyze students’ survey responses to
determine the impact of the AIF system on students’ intentions to
persist in CS. We use a linear mixed-eects model, as each student
rated their intentions to persist in CS twice (once in the pre-survey
and once in the post-survey). In Table 2, time takes on a value of ‘0’
for the pre-survey and ‘1’ for the post-survey. AIF takes on a value
of ‘0’ for all observations at time 0 (as no students had experienced
the AIF system at this time), and takes on a value of ‘1’ at time
1 if the student was in the treatment group and received the AIF
system. Pretest takes on the value of the student’s pretest score,
in order to control for students’ initial programming ability. We
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
Table 2: Estimated coecients (Standard Error) of linear
mixed models with repeated measures predicting CS persis-
tence intentions.8
Coe. (Std. Err.)
AIF 2.234 (0.968)*
Time -0.472 (0.682)
Pretest 0.502 (0.394)
Intercept 22.253 (1.995)***
Observations 50
Signicant codes (𝑝<): + = 0.1, * = 0.05, ** = 0.01, *** = 0.001.
nd that neither time (p = 0.489) nor pretest score (p = 0.203) has a
statistically signicant impact on students’ intentions to persist in
CS, as shown in Table 2.
We also nd that the AIF system signicantly improves CS per-
sistence intentions. Our linear mixed-eects model (which controls
for pre-existing dierences in ability between students) predicts a
CS persistence score of 24.26 at time 0 (before students had com-
pleted any programming tasks) for students with an average pretest
score of 4. However, at time 1 (after students had completed pro-
gramming tasks), the model predicts a CS persistence score of 23.79
for students in the control condition and 26.02 for students who
received the AIF system. This is a dierence of 2.23 points (p =
0.021), amounting to CS persistence intentions that are about 9.37%
higher for students of average ability who received the AIF system
(Table 2). This provides support for H2-persist, as the AIF system
improves CS persistence intentions.
5.3 H3-idle: Engagement
We use an ordinary least squares (OLS) linear regression model
to investigate the impact of the AIF treatment on student engage-
ment (H3-idle), as shown in Table 3. Engagement was measured
by using idle time (discussed previously in Section 4.3), which was
ascertained using programming log data. An OLS linear regression
model was used instead of a t-test because it allowed us to control
for pretest score. We did not use a linear mixed model, as there
was no need to account for a lack of independence among obser-
vations because there was only one observation of engagement
(idle time) per student. In Table 3, we predict the percent of total
programming time a student spent idle controlling for condition
and pretest score (see subsection 5.2, above, for details on coding).
We nd that pretest scores do not have a statistically signicant
impact on engagement (p = 0.533).
Importantly, we nd that students who used the AIF system
spent signicantly less time idle (p = 0.013). Our model predicts
that students with an average pretest score of 4 spent 22.6% of
their programming time idle if they were in the control group,
versus only 3.6% if they were in the AIF group. This means that
the AIF system had a substantial impact on student engagement,
as it reduced idle time by 84.2% for students with average pretest
ability. In sum, we found
strong support for H3-idle
, that the
AIF system improves student engagement.
Within-group errors were modeled to have an autoregressive structure with a lag of
1, given the time-lag between observations.
Table 3: Estimated coecients (Standard Error) of OLS lin-
ear regression models predicting student engagement (mea-
sured as percentage of programming time spent idle).
Coe. (Std. Err.)
AIF -0.191 (0.078)*
Pretest 0.010 (0.015)
Intercept 0.188 (0.075)*
Observations 25
Signicant codes (𝑝<): + = 0.1, * = 0.05, ** = 0.01, *** = 0.001.
5.4 H4-perf: Programming Performance
We investigate H4-perf (that the AIF system improved programming
performance), by analyzing programming log data on the rst two
tasks. We use a linear probability model with mixed eects to predict
the likelihood that a student completed an objective (‘1’ = completed,
‘0’ = not completed) during the experimental phase of the study (i.e.,
the rst eight objectives in which the experimental group used the
AIF system). Thus, a total of 200 observations (i.e., whether a student
completed a given objective) were nested within the 25 participants
for this analysis. We predict the likelihood that a student completed
an objective, using the student’s pre-test score and the treatment as
predictors (see subsection 5.2, above, for details on coding) (Model
A, Table 4). We nd that a student’s pre-test score has no eect on
their likelihood of completing an objective (p = 0.315). However,
the AIF system has a marginally statistically signicant impact
on programming performance (p = 0.098), as students in the
condition were 13.1 percentage points more likely to complete an
objective than students in the control condition. Thus, an average
student (as measured by pretest score = 4) would be expected to
complete 81.4% of the objectives if they were in the control condition
and 94.5% of the objectives if they were in the AIF condition. This
provides marginal support for H4-perf, that the AIF system would
improve students’ performance.
5.5 H5-learning: Learning
To investigate H5-learning, that the AIF system improves learning,
we examine student performance on the last phase of the study (i.e.,
on task 3, in which neither group used the AIF system to complete
the last ve objectives). We again use a linear mixed eects model
to predict the likelihood that a student completed an objective (‘1’
= completed, ‘0’ = not completed). We predict the likelihood that a
student completed an objective controlling for a student’s pre-test
score and the treatment (see subsection 5.2, above, for details on
coding) (Model B, Table 4). We again nd no eect of a student’s pre-
test score on their likelihood of completing an objective (p = 0.952).
Importantly, students receiving the AIF treatment were 25.4 per-
centage points more likely to complete an objective than students
in the control condition, but this dierence was only marginally
signicant (p = 0.056). Thus, a student with a pretest score of 4
would be expected to complete 47.7% of the objectives if they were
in the control condition and 73.1% of the objectives if they were in
the AIF condition. This provides marginal support for H5-learning,
that the AIF system would improve students’ learning.
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
Table 4: Estimated coecients (Standard Error) of linear
probability models (LPM) with mixed eects and repeated
measures predicting the likelihood that a student completed
an objective.
Experimental Phase
(Part 1)
Learning Phase
(Part 2)
Model A Model B
Pretest -0.017 (0.017) 0.002 (0.029)
AIF 0.131 (0.079)+0.254 (0.135)+
Intercept 0.882 (0.083)*** 0.470 (0.142)***
Observations 200 125
Signicant codes (𝑝<): + = 0.1, * = 0.05, ** = 0.01, *** = 0.001.
Figure 2: Mediation test results for eect of AIF intervention
on idle time and likelihood of completing an objective dur-
ing the experimental phase. Model controls for the eect of
pretest scores. Clustered robust standard errors are shown
in parentheses. N = 200 observations from 25 students.
5.6 Exploratory Mediation Analyses
We conduct an exploratory, post-hoc mediation analysis to investi-
gate whether the AIF system may have improved objective com-
pletion in the experimental phase (part 1, with 200 observations)
because it increased student engagement. We constructed a path
model [
] to conduct our mediation analysis, allowing us to statisti-
cally suggest causal relationships in our variables: AIF, Engagement,
and Objective completion. Controlling for pretest score, we found
that once engagement is taken into account, the AIF system did
not have a direct eect on objective completion (coe. = -0.008, p
= 0.905). Instead, we found evidence for an indirect eect of the
AIF system on objective completion through the impact of the AIF
system on engagement. Figure 2 illustrates the results from our
path model, which nds that the AIF system reduces idle time by
17.9 percentage points (p = 0.005), and that idle time has a large,
negative impact (coe. = -0.727, p < 0.001) on the likelihood that a
student completes an objective. Moreover, our mediation analysis
revealed the size of the indirect eect of the AIF system: students
who receive the AIF system are 13.0 percentage points more likely to
complete an objective (p = 0.011) because the AIF system increased
their engagement.
Our results provide compelling evidence that AIF can signicantly
improve students’ intention to persist in CS. This impact of the
AIF system is especially important, given that our participants had
not yet entered university or declared a major, and thus the use of
our AIF system could entice more students to study CS. In addi-
tion, these ndings are important to the CS education community,
since prior work in tutoring systems in computing mainly focus
on the impact of feedback on students’ cognitive outcomes, such
as learning and performance, rather than aective outcomes which
can dramatically impact student decisions. To our knowledge, this
is the rst evidence that automated programming feedback can im-
prove students’ desire to persist in CS, and this nding is a primary
contribution of this work. We believe a primary reason for this
impact on student’s intentions is our feedback designed to ensure
that students receive positive messages and conrmation of their
success [35].
Our second compelling result is that AIF signicantly improved
students’ engagement with our programming tasks. In particular,
the system dramatically reduced idle time, from almost a quarter of
students’ time to less than 5%. For context, this eect on idle time
is larger than that found in prior work from using a block-based in-
stead of a text-based programming environment [
], which is often
suggested as an eective way to better engage novice programmers.
In addition, these ndings are consistent with our qualitative inter-
views, which suggested that the AIF helped keep students on track,
letting the students know what they had completed and what there
was left to do. These results are also consistent with the “uncertainty
reduction” hypothesis presented by Mitrovic et al. [
], suggesting
that positive feedback helps students continue working since they
are more certain about their progress in an open-ended task. Based
on our combined qualitative interview analysis and quantitative log
data analysis, we conclude that the AIF system helped students stay
engaged. Moreover, our mediation analysis suggests that the AIF
signicant impacts on idle time directly helped students complete
more objectives as we discuss below. We believe that these impacts
are a direct consequence of well-designed adaptive immediate feed-
back that helped students understand their meaningful successes
and mistakes.
We also nd suggestive evidence that the AIF system improved
students’ performance and learning. While only marginally signi-
cant, the eect sizes on performance and learning were moderately
large, and our ability to detect them was limited by our smaller
sample size (suggesting that these eects might be signicant in
a larger study). It seems likely that our AIF could have impacted
these outcomes, given that prior work has found that well-designed,
timely feedback provided to students can improve both performance
and learning in programming [
]. Our AIF system also pro-
vides students with aective feedback via pop up messages, that
may boost student motivation just when it is waning. Lastly, our
mediation analysis suggests that the observed improvements in
students’ performance may have been because of the reduction in
idle time. This is supported by research nding that engagement
matters for cognitive outcomes [
]. This suggests that incorporat-
ing more positive messaging along with other forms of feedback
(e.g. misconception feedback [
]) may further improve cognitive
outcomes in programming. We note that feedback must be carefully
designed and may be particularly benecial if it is designed taking
into account its impact on both cognitive and aective outcomes
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
We believe that the ndings in this study may be generalizable for
novices learning to program in other programming environments
and using other programming languages. This is because the core of
the AIF system is its expert-authored autograders, which monitor
students’ progress, and there are several programming environ-
ments that have this autograding capability [
]. Therefore,
auto-graders for other programming languages and environments
can be modied to provide similar adaptive immediate feedback. For
instance, in a Python programming environment with autograding,
a similar AIF design could be used to detect the completion of test
cases every time a student compiles their code, and experts could
add explanations to each test case (similar to the expert-authored
feedback in [
]), and design encouraging pop-up messages when-
ever a test case is passed, or xed. We argue that experts who have
written autograders should be able to implement similar adaptive,
immediate feedback in other systems, using either block- or text-
based programming languages, as AIF only requires the ability
to autograde students’ work and track their time. These systems
should oer similar benets to students, according to learning the-
ories [59].
This study has six main limitations. First, since all students had
access to hints and instructor help, we do not know if the AIF sys-
tem would be equally eective in classrooms without this additional
support. However, we found little dierence in hint usage between
conditions, and our observations reveal that few students asked for
instructor help. Second, the interviews may reect response bias
or novelty eects that may have led to more positive answers, but
we tried to minimize potential bias by asking for both the positive
and negative aspects of AIF features. Third, having the interview
about the AIF system before task 3 could have improved students’
motivation in the
group to complete the third programming
task. Fourth, while in our study procedure we collected a post-test
from students, we decided not to report learning gains (i.e. the
dierence between pretest and posttest scores) since, while doing
our analysis, we found 5 students who did not submit their tests
(although during the camp all students claimed they completed all
the tests). Fifth, there may be other reasons that AIF reduced idle
time; for example, breaking down programming tasks into smaller
objectives may have made the tasks less dicult for students. How-
ever, our analysis of task 3 suggests that this potential reduction
in diculty did not hinder student learning in a subsequent task
without AIF. Sixth and nally, our study had a small sample size,
and it lasted for a short period of time. However, we argue that
this study shows the promising potential of the impact of the AIF
system, with results that are consistent with learning theories, and
in our future work we plan to conduct a larger classroom study to
verify the impact of AIF in dierent settings.
of this work are the design and development of
a new adaptive immediate feedback system based on autograders,
and a controlled study demonstrating that our AIF system: 1) is
well-received by students, 2) improves students intention to persist
in CS, 3) increases students’ engagement, 4) improves students’
performance, and 5) learning. Our interview results conrmed our
hypothesis (H1-qual) that the AIF system would be positively per-
ceived by students, and our survey results conrmed our hypothesis
(H2-persist) that AIF would improve students’ intention to persist in
CS. Additionally, by investigating students’ log data, we conrmed
our hypothesis (H3-idle) that the AIF system would improve stu-
dents’ engagement during programming (as measured by students’
idle time spent while programming). Moreover, from analyzing stu-
dents’ programming performance, our results partially support our
hypotheses (H4-perf) that the AIF system would improve students’
performance and (H5-learning) learning. In future work, we plan
to generalize our approach to develop adaptive immediate feedback
for more assignments and other programming languages. In ad-
dition, we plan to conduct larger classroom studies to investigate
the impact of adaptive immediate feedback within the context of
graded assignments.
This material is based upon work supported by the National Science
Foundation under grant 1623470.
Vincent Aleven, Ido Roll, Bruce M. McLaren, and Kenneth R. Koedinger. 2016.
Help Helps, But Only So Much: Research on Help Seeking with Intelligent Tu-
toring Systems. International Journal of Articial Intelligence in Education 26, 1
(2016), 1–19.
Joshua D Angrist and Jörn-Steen Pischke. 2008. Mostly harmless econometrics:
An empiricist’s companion. Princeton university press.
Susan J Ashford. 1986. Feedback-seeking in individual adaptation: A resource
perspective. Academy of Management journal 29, 3 (1986), 465–487.
Susan J Ashford, Ruth Blatt, and Don VandeWalle. 2003. Reections on the
looking glass: A review of research on feedback-seeking behavior in organizations.
Journal of management 29, 6 (2003), 773–799.
[5] Michael Ball. 2018. Lambda: An Autograder for
. Technical Report. Electrical Engineering and Computer Sciences Univer-
sity of California at Berkeley.
Lecia J Barker, Charlie McDowell, and Kimberly Kalahar. 2009. Exploring factors
that inuence computer science introductory course students to persist in the
major. ACM Sigcse Bulletin 41, 1 (2009), 153–157.
Devon Barrow, Antonija Mitrovic, Stellan Ohlsson, and Michael Grimley. 2008.
Assessing the impact of positive feedback in constraint-based tutors. In Interna-
tional Conference on Intelligent Tutoring Systems. Springer, 250–259.
Brett A Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle
Goslin, and Catherine Mooney. 2016. Eective compiler error message enhance-
ment for novice programming students. Computer Science Education 26, 2-3
(2016), 148–175.
Brett A. Becker, Kyle Goslin, and Graham Glanville. 2018. The Eects of Enhanced
Compiler Error Messages on a Syntax Error Debugging Test. (2018).
Maureen Biggers, Anne Brauer, and Tuba Yilmaz. 2008. Student perceptions
of computer science: a retention study comparing graduating seniors with cs
leavers. ACM SIGCSE Bulletin 40, 1 (2008), 402–406.
Phyllis C Blumenfeld, Toni M Kempler, and Joseph S Krajcik. 2006. Motivation
and cognitive engagement in learning environments. na.
Kristy Elizabeth Boyer, Robert Phillips, Michael D Wallis, Mladen A Vouk, and
James C Lester. 2008. Learner characteristics and feedback in tutorial dialogue.
In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educa-
tional Applications. Association for Computational Linguistics, 53–61.
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.
Qualitative research in psychology 3, 2 (2006), 77–101.
J. Bruin. 2011 (accessed April 6, 2020). INTRODUCTION TO LINEAR MIXED
Erin Cech, Brian Rubineau, Susan Silbey, and Caroll Seron. 2011. Professional
role condence and gendered persistence in engineering. American Sociological
Review 76, 5 (2011), 641–666.
AT Corbett and John R Anderson. 1989. Feedback timing and student control in
the LISP Intelligent Tutoring System. In Proceedings of the Fourth International
Conference on AI and Education. 64–72.
Albert Corbett and John R. Anderson. 2001. Locus of Feedback Control in
Computer-Based Tutoring: Impact on Learning Rate, Achievement and Attitudes.
In Proceedings of the SIGCHI Conference on Human Computer Interaction. 245–252.
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
Shelley J Correll. 2004. Constraints into preferences: Gender, status, and emerging
career aspirations. American sociological review 69, 1 (2004), 93–113.
Peter J Denning and Edward E Gordon. 2015. A technician shortage. Commun.
ACM 58, 3 (2015), 28–30.
Daantje Derks, Arjan ER Bos, and Jasper Von Grumbkow. 2008. Emoticons
and online message interpretation. Social Science Computer Review 26, 3 (2008),
Barbara Di Eugenio, Davide Fossati, Stellan Ohlsson, and David Cosejo. 2009.
Towardsexplaining eective tutorial dialogues. In Annual Meeting of the Cognitive
Science Society. 1430–1435.
Yihuan Dong, Samiha Marwan, Veronica Catete, Thomas W. Price, and Tiany
Barnes. 2019. Dening Tinkering Behavior in Open-ended Block-based Pro-
gramming Assignments. In Proceedings of the 50th ACM Technical Symposium on
Computer Science Education. ACM, 1204–1210.
Stephen H Edwards and Krishnan Panamalai Murali. 2017. CodeWorkout: short
programming exercises with built-in data collection. In Proceedings of the 2017
ACM Conference on Innovation and Technology in Computer Science Education.
Ayelet Fishbach and Stacey R Finkelstein. 2012. How feedback inuences persis-
tence, disengagement, and change in goal pursuit. Goal-directed behavior (2012),
Davide Fossati, Barbara Di Eugenio, STELLAN Ohlsson, Christopher Brown, and
Lin Chen. 2015. Data driven automatic feedback generation in the iList intelligent
tutoring system. Technology, Instruction, Cognition and Learning 10, 1 (2015),
Luke Gusukuma, Austin Cory Bart, Dennis Kafura, and Jeremy Ernst. 2018.
Misconception-driven feedback: Results from an experimental study. In Proceed-
ings of the 2018 ACM Conference on International Computing Education Research.
Luke Gusukuma, Dennis Kafura, and Austin Cory Bart. 2017. Authoring feedback
for novice programmers in a block-based language. In 2017 IEEE Blocks and
Beyond Workshop (B&B). IEEE, 37–40.
Ottar Hellevik. 2009. Linear versus logistic regression when the dependent
variable is a dichotomy. Quality & Quantity 43, 1 (2009), 59–74.
David Hovemeyer and Jaime Spacco. 2013. CloudCoder: a web-based program-
ming exercise system. Journal of Computing Sciences in Colleges 28, 3 (2013),
Tony Jenkins. 2001. The motivation of students of programming. In Proceedings
of the 6th Annual SIGCSE Conference on Innovation and Technology in Computer
Science Education, ITiCSE 2001, Canterbury, UK, June 25-27, 2001. 53–56.
Sandra Katz, David Allbritton, John Aronis, Christine Wilson, and Mary Lou
Soa. 2006. Gender, achievement, and persistence in an undergraduate com-
puter science program. ACM SIGMIS Database: the DATABASE for Advances in
Information Systems 37, 4 (2006), 42–57.
Rex B Kline. 2015. Principles and practice of structural equation modeling. Guilford
Michael J Lee and Andrew J Ko. 2011. Personifying programming tool feedback
improves novice programmers’ learning. In Proceedings of the seventh interna-
tional workshop on Computing education research. ACM, 109–116.
Mark R Lepper, Maria Woolverton, Donna L Mumme, and J Gurtner. 1993. Moti-
vational techniques of expert human tutors: Lessons for the design of computer-
based tutors. Computers as cognitive tools 1993 (1993), 75–105.
Colleen M Lewis, Ken Yasuhara, and Ruth E Anderson. 2011. Deciding to major
in computer science: a grounded theory of students’ self-assessment of ability. In
Proceedings of the seventh international workshop on Computing education research.
R Luckin et al
2007. Beyond the code-and-count analysis of tutoring dialogues.
Articial intelligence in education: Building technology rich learning contexts that
work, R. Luckin, KR Koedinger, and J. Greer, Eds. IOS Press (2007), 349–356.
Moira Maguire and Brid Delahunt. 2017. Doing a thematic analysis: A practical,
step-by-step guide for learning and teaching scholars. AISHE-J: The All Ireland
Journal of Teaching and Learning in Higher Education 9, 3 (2017).
Chris Martin, Janet Hughes, and John Richards. 2017. Designing engaging learn-
ing experiences in programming. In International Conference on Computer Sup-
ported Education. Springer, 221–245.
Samiha Marwan, Anay Dombe, and Thomas W. Price. 2020. Unproductive Help-
seeking in Programming: What it is and How to Address it?. In The Proceedings
of the 2020 ACM Conference on Innovation and Technology in Computer Science
(ITiCSE’20). ACM.
Samiha Marwan, Joseph Jay Williams, and Thomas W. Price. 2019. An Evaluation
of the Impact of Automated Programming Hints on Performance and Learning.
In Proceedings of the 2019 ACM Conference on International Computing Education
Research. ACM, 61–70.
Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas W. Price. 2019.
The Impact of Adding Textual Explanations to Next-step Hints in a Novice Pro-
gramming Environment. In Proceedings of the 2019 ACM Conference on Innovation
and Technology in Computer Science Education. ACM, 520–526.
David M Merolla, Richard T Serpe, Sheldon Stryker, and P Wesley Schultz. 2012.
Structural precursors to identity processes: The role of proximate social structures.
Social Psychology Quarterly 75, 2 (2012), 149–172.
Antonija Mitrovic, Stellan Ohlsson, and Devon K Barrow. 2013. The eect of
positive feedback in a constraint-based intelligent tutoring system. Computers &
Education 60, 1 (2013), 264–272.
Roxana Moreno and Richard E Mayer. 2004. Personalized messages that promote
science learning in virtual environments. Journal of educational Psychology 96, 1
(2004), 165.
Laurie Murphy, Gary Lewandowski, Renée McCauley, Beth Simon, Lynda
Thomas, and Carol Zander. 2008. Debugging: the good, the bad, and the quirky–a
qualitative analysis of novices’ strategies. ACM SIGCSE Bulletin 40, 1 (2008),
Susanne Narciss and Katja Huth. 2004. How to design informative tutoring
feedback for multimedia learning. Instructional design for multimedia learning
181195 (2004).
David J Nicol and Debra Macfarlane-Dick. 2006. Formative assessment and
self-regulated learning: A model and seven principles of good feedback practice.
Studies in higher education 31, 2 (2006), 199–218.
Pete Nordquist. 2007. Providing accurate and timely feedback by automatically
grading student programming labs. Journal of Computing Sciences in Colleges 23,
2 (2007), 16–23.
Helen J Parkin, Stuart Hepplestone, Graham Holden, Brian Irwin, and Louise
Thorpe. 2012. A role for technology in enhancing students’ engagement with
feedback. Assessment & Evaluation in Higher Education 37, 8 (2012), 963–973.
Gary D Phye and Thomas Andre. 1989. Delayed retention eect: attention,
perseveration, or both? Contemporary Educational Psychology 14, 2 (1989), 173–
Thomas W. Price and Tiany Barnes. 2015. Comparing Textual and Block Inter-
faces in a Novice Programming Environment. In Proceedings of the International
Computing Education Research Conference.
Thomas W. Price, Yihuan Dong, and Dragan Lipovac. 2017. iSnap: Towards
Intelligent Tutoring in Novice Programming Environments. In Proceedings of the
ACM Technical Symposium on Computer Science Education.
Thomas W. Price, Zhongxiu Liu, Veronica Catete, and Tiany Barnes. 2017.
Factors Inuencing Students’ Help-Seeking Behavior while Programming with
Human and Computer Tutors. In Proceedings of the International Computing
Education Research Conference.
Thomas W. Price, Rui Zhi, and Tiany Barnes. 2017. Hint Generation Under
Uncertainty: The Eect of Hint Quality on Help-Seeking Behavior. In Proceedings
of the International Conference on Articial Intelligence in Education.
Monica A Riordan. 2017. Emojis as tools for emotion work: Communicating
aect in text messages. Journal of Language and Social Psychology 36, 5 (2017),
Kelly Rivers and Kenneth R. Koedinger. 2017. Data-Driven Hint Generation in
Vast Solution Spaces: a Self-Improving Python Programming Tutor. International
Journal of Articial Intelligence in Education 27, 1 (2017), 37–64.
Mary Catherine Scheeler, Kathy L Ruhl, and James K McAfee. 2004. Providing
performance feedback to teachers: A review. Teacher education and special
education 27, 4 (2004), 396–407.
Valerie J Shute. 2008. Focus on formative feedback. Review of educational research
78, 1 (2008), 153–189.
Marieke Thurlings, Marjan Vermeulen, Theo Bastiaens, and Sjef Stijnen. 2013.
Understanding feedback: A learning theory perspective. Educational Research
Review 9 (2013), 1–15.
Jodie B Ullman and Peter M Bentler. 2003. Structural equation modeling. Hand-
book of psychology (2003), 607–634.
Paul Von Hippel. 2015. Linear vs. logistic probability models: Which is better,
and when. Statistical Horizons (2015).
Wengran Wang, Rui Zhi, Alexandra Milliken, Nicholas Lytle, and Thomas W.
Price. 2020. Crescendo : Engaging Students to Self-Paced Programming Practices.
In Proceedings of the ACM Technical Symposium on Computer Science Education.
Wengran Wang, Rui Zhi, Alexandra Milliken, Nicholas Lytle, and Thomas W.
Price. 2020. Crescendo: Engaging Students to Self-Paced Programming Practices.
In To be published in the 51st ACM Technical Symposium on Computer Science
Education (SIGCSE ’20).
David Weintrop and Uri Wilensky. 2015. Using Commutative Assessments to
Compare Conceptual Understanding in Blocks-based and Text-based Programs..
In ICER, Vol. 15. 101–110.
Rui Zhi, Thomas W. Price, Nicholas Lytle, Yihuan Dong, and Tiany Barnes.
2018. Reducing the State Space of Programming Problems through Data-Driven
Feature Detection. In Educational Data Mining in Computer Science Education
(CSEDM) Workshop@ EDM.
Day 3: CS-1, Novices
ICER ‘20, August 10–12, 2020, Virtual Event, New Zealand
... Formative feedback is defined as a type of task-level feedback that provides specific, timely information to a student in response to a particular problem or task, based on the student's current ability [Shu08]. From a cognitive learning theory (CLT) perspective, formative feedback can reduce students' uncertainty about how well, or poorly, they are performing on a task [Mit13;Paa03], and it can therefore increase students' motivation and persistence to complete tasks by revealing the progress that students have already made [Mar20a]. ...
... expert-authored and data-driven approaches. For expert-authored approaches, human experts define the subgoals of a correct solution, and create autograders for each objective to detect if it is complete or incomplete, for example using static code analysis [Mar20a;Gus17]. ...
... In Summer 2019, we developed and tested AIF version 1.0 [Mar20a]. We broke down programming tasks into a set of objectives (i.e. ...
Full-text available
MARWAN, SAMIHA ABDELRAHMAN MOHAMMED. Investigating Best Practices in the Design of Automated Hints and Formative Feedback to Improve Students' Cognitive and Affective Outcomes. (Under the direction of Thomas W. Price). Timely support is essential for students to learn and improve their performance. However , in large programming classrooms, it is hard for instructors to provide real-time support (such as hints) for every student. While researchers have put tremendous effort into developing algorithms to generate automated programming support, few controlled studies have directly evaluated its impact on students' performance, learning and affective outcomes. Additionally, while some studies show that automated support can improve students' learning , it is unclear what specific design choices make them more or less effective. Furthermore, few, if any, prior studies have investigated how well these results can be replicated in multiple learning contexts. Inspired by educational theories and effective human feedback, my dissertation has the goal of designing and evaluating different design choices of automated support, specifically next-step hints and formative feedback, to improve students' cognitive and affective outcomes in programming classrooms. In this thesis I present five studies that attempt to overcome limitations in existing forms of automated support to improve students' outcomes, specifically hints and formative feedback. Hints may be ineffective when they: 1) are hard to interpret, 2) fail to engage students to reason critically about the hint, and 3) fail to guide students to effectively seek help. In Study 1, I addressed the first two challenges by evaluating the impact of adding textual explanations to hints (i.e. explaining what the hint was suggesting), as well as adding self-explanation prompts to hints (i.e. asking students to reflect on how to use the hint). I found that hints with these two design features together increased learners' learning as evidenced by the increase in their performance on future isomorphic programming tasks (without hints available). In Study 2, I tackled the third challenge in two phases. First, I created a preliminary taxonomy of unproductive help-seeking behaviors during programming. Then, using this taxonomy, I designed and evaluated a novel user interface for requesting hints that subtly encourages students to seek help with the right frequency, estimated with a data-driven algorithm. This led to an improvement in students' help-seeking behavior. In Study 3, I replicated my first two studies in an authentic classroom setting, across several weeks, with a different population, to investigate the consistency and generalizability of my results. I found that hints with textual explanations and self-explanation prompts improved students’ programming performance, and increased students’ programming efficiency in homework tasks, but the effectiveness of hints was not uniform across problems. Formative feedback is effective when it is immediate, specific, corrective and positive. Learning theories, and empirical human tutoring studies show that such elements of feed- back can improve both students’ cognitive and affective outcomes. While many automated feedback systems have some of these feedback elements, few have them all (such as provid- ing only corrective feedback but not encouraging positive feedback), and those were only evaluated on a small set of short programming tasks. In Study 4, I tackled this gap in re- search by developing an adaptive immediate feedback (AIF) system, using expert-authored rules, that provides students with immediate positive and corrective feedback on their progress while programming. I found that the AIF system improves students’ performance, engagement in programming, and intentions to persist in computer science. Lastly, in Study 5 I developed a hybrid data-driven algorithm to generate feedback that can be easily scaled across different programming tasks, with high accuracy and low expert effort. I then used this algorithm to design an improved version of the AIF system (i.e. AIF 3.0), with a more granular feedback level. In Study 5, I deployed and evaluated the AIF 3.0 system in an authentic CS0 classroom study over several weeks. I found that the AIF 3.0 system improved students’ performance and the proportion of students who fully completed the programming tasks, indicating increased persistence. Studies 1, 2, and 4 are laboratory studies, while Studies 3 and 5 are classroom studies, all conducted with iSnap, a block-based programming environment. The contributions of this thesis include: 1) the discovery of effective design choices for automated hints, 2) the design of adaptive immediate feedback systems, leveraging an expert-authored and hybrid data-driven models, 3) an empirical evaluation of the impact of automated hints and formative feedback on learners’ cognitive, and affective outcomes, and lastly 4) replication evaluations of hints and feedback in authentic classroom settings, suggesting consistent effects across different populations and learning contexts. These contributions inform researchers’ knowledge of challenges of automated support designs using either data-driven or expert-authored models, as well as challenges in classroom studies for open-ended programming tasks; and how they can affect students’ outcomes, which overall can guide future research directions in computing education and human- computer interaction areas.
... They subsequently analyze both short-term and long-term effects of students' emotional responses to project scores, considering also the students' gender [95]. Marwan et al. [113] study how adaptive immediate feedback mediates engagement and the likelihood that students will complete a task and persist in learning computing. ...
... Marwan et al. [113] Path model showing how adaptive immediate feedback mediates engagement and likelihood of novice programmers completing a task 2 Area of focus: learning/understanding Hughes et al. [61] Good's schema refined and extended to extract information about novice programmers' comprehension of concurrent programs and their confidence in the captured knowledge 11 Wiedenbeck [192] Path model of factors of importance in learning to program (previous programming experience, perceived self-efficacy, and knowledge organisation) 179 Eckerdal et al. [41] Phenomenographical outcome space with five categories describing novice students' understanding of what it means to learn to program 129 Berglund and Eckerdal [11] Three phenomenographical outcome spaces, describing advanced computer science students' motives to take a project-based distributed course in computer systems: academic achievement; project and team working capacity; social competence 28 Byckling and Sajaniemi [22] Role plan analysis model: analysis model to evaluate students' mental models of programming concepts 21 Stamouli and Huggard [165] Two phenomenographical outcome spaces describing introductory objectoriented programming students' understanding of learning to program and program correctness 35 Lopez et al. [98] Hierarchical model of introductory programming skills 257 ...
Use of theory within a field of research provides the foundation for designing effective research programs and establishing a deeper understanding of the results obtained. This, together with the emergence of domain-specific theory, is often taken as an indicator of the maturity of any research area. This paper explores the development and subsequent usage of domain-specific theories and theoretical constructs (TCs) in computing education research (CER). All TCs found in 878 papers published in three major CER publication venues over the period 2005–2020 were identified and assessed to determine the nature and purpose of the constructs found. We focused more closely on areas related to learning, studying, and progression, where our analysis found 80 new TCs that had been developed, based on multiple epistemological perspectives. Several existing frameworks were used to categorize the areas of CER focus in which TCs were found, the methodology by which they were developed, and the nature and purpose of the TCs. A citation analysis was undertaken, with 1727 citing papers accessed to determine to what extent and in what ways TCs had been used and developed to inform subsequent work, also considering whether these aspects vary according to different focus areas within computing education. We noted which TCs were used most often and least often, and we present several brief case studies that demonstrate progressive development of domain-specific theory. The exploration provides insights into trends in theory development and suggests areas in which further work might be called for. Our findings indicate a general interest in the development of TCs during the period studied, and we show examples of how different approaches to theory development have been used. We present a framework suggesting how strategies for developing new TCs in CER might be structured, and discuss the nature of theory development in relation to the field of CER.
... Concerning the challenge P1 (mixed-ability students), ScratchThAI employs the learner-centered approach and personalized learning approach to address the challenge where students are able to learn at their own knowledge level, pace and style (Bjork and Bjork 2011;Deunk et al. 2018;OECD 2012;Schleicher 2016;Peng et al. 2019). Personalized & Adaptive Technology (T1), and Automatic CT Assessment (T2) are the key enabling technologies that can help improve students' progress and engagement (Campos et al. 2012;Hattie and Timperley 2007;Marwan et al. 2020). ...
Full-text available
Computational Thinking (CT) has been formally incorporated into the National Curriculum of Thailand since 2017, where Scratch, a block-based visual programming language, has been widely adopted as CT learning environment for primary-level students. However, conducting hands-on coding activities in a classroom has caused substantial challenges including mixed-ability students in the same class, high student-teacher ratio and learning-hour limitation. This research proposes and develops ScratchThAI as a conversation-based learning support framework for computational thinking development to support both students and teachers. More specifically, it provides learning experiences tailored to individual needs. Students can learn CT concepts and practice online coding anywhere, anytime. Moreover, through its ScratChatbot, students can ask for CT concept explanations, coding syntax or practice exercises. Additional exercises may be assigned to students based on the diagnosed individual learning difficulties in a particular topic to provide possible and timely intervention. Teachers can track learning progress and performance of the whole class as well as of individuals through the dashboard and can take suitable intervention within limited school hours. Deploying ScratchThAI to several Thai schools has enabled this research to investigate its effectiveness in a school setting. The obtained results indicated positive teacher satisfaction, better learning performance and higher student engagement. Thus, ScratchThAI contributes as a possible and practical solution to CT skill development and CT education improvement under the aforementioned challenges in Thailand.
... In the space of ITSs a problem is how to get a student to the correct solution. One of the ways to achieve this is by having an expert to author the correct solution and the path to take to get there (Marwan et al., 2020;Unnam et al., 2019;Ariely et al., 2020). These all require the use of an expert to either structure the assignment such that feedback can be extracted or to label assignments to create a system that can learn the expert knowledge. ...
Fostering students' computer programming skills has become an important educational issue in the globe. However, it remains a challenge for students to understand those abstract concepts when learning computer programming, implying the need to provide instant learning diagnosis and feedback in computer programming activities. In this study, a Two-Tier Test-Based Programming Training (T ³ PT) approach was proposed. Accordingly, an online learning system was developed to provide students with precision feedback for guiding them to identify misconceptions of computer programming to improve their computer programming learning achievement. In order to examine the effects of the proposed approach, a learning system was developed and a quasi-experiment was conducted. Two classes of 99 eighth-grade students from Taiwan were divided into an experimental group and a control group. The students in the experimental group used the learning system based on the T ³ PT approach, while the control group used the conventional learning system. The experimental results showed that the proposed approach was significantly superior to the conventional programming learning approach in terms of students' programming logic concepts, problem-solving awareness, technology acceptance, and satisfaction with the learning approach. Accordingly, discussion and suggestions are provided for future research.
Conference Paper
Full-text available
Tinkering has been shown to have a positive influence on students in open-ended making activities. Open-ended programming assignments in block-based programming resemble making activities in that both of them encourage students to tinker with tools to create their own solutions to achieve a goal. However, previous studies of tinkering in programming discussed tinkering as a broad, ambiguous term, and investigated only self-reported data. To our knowledge, no research has studied student tinkering behaviors while solving problems in block-based programming environments. In this position paper, we propose a definition for tinkering in block-based programming environments as a kind of behavior that students exhibit when testing, exploring, and struggling during problem-solving. We introduce three general categories of tinkering behaviors (test-based, prototype-based, and construction-based tinkering) derived from student data, and use case studies to demonstrate how students exhibited these behaviors in problem-solving. We created the definitions using a mixed-methods research design combining a literature review with data-driven insights from submissions of two open-ended programming assignments in iSnap, a block-based programming environment. We discuss the implication of each type of tinkering behavior for learning. Our study and results are the first in this domain to define tinkering based on student behaviors in a block-based programming environment.
Conference Paper
Full-text available
The feedback given to novice programmers can be substantially improved by delivering advice focused on learners' cognitive misconceptions contextualized to the instruction. Building on this idea, we present Misconception-Driven Feedback (MDF); MDF uses a cognitive student model and program analysis to detect mistakes and uncover underlying misconceptions. To evaluate the impact of MDF on student learning, we performed a quasi-experimental study of novice programmers that compares conventional run-time and output check feedback against MDF over three semesters. Inferential statistics indicates MDF supports significantly accelerated acquisition of conceptual knowledge and practical programming skills. Additionally, we present descriptive analysis from the study indicating the MDF student model allows for complex analysis of student mistakes and misconceptions that can suggest improvements to the feedback, the instruction, and to specific students.
A step-by-step guide to conducting a thematic analysis within the context of learning and teaching.
Conference Paper
While programming, novices often lack the ability to effectively seek help, such as when to ask for a hint or feedback. Students may avoid help when they need it, or abuse help to avoid putting in effort, and both behaviors can impede learning. In this paper we present two main contributions. First, we investigated log data from students working in a programming environment that offers automated hints, and we propose a taxonomy of unproductive help-seeking behaviors in programming. Second, we used these findings to design a novel user interface for hints that subtly encourages students to seek help with the right frequency, estimated with a data-driven algorithm. We conducted a pilot study to evaluate our data-driven (DD) hint display, compared to a traditional interface, where students request hints on-demand as desired. We found students with the DD display were less than half as likely to engage in unproductive help-seeking, and we found suggestive evidence that this may improve their learning. The primary contributions of this work are a novel taxonomy of unproductive help-seeking behaviors in programming and design insights that suggest how help interfaces can deter this behavior.
Conference Paper
A growing body of work has explored how to automatically generate hints for novice programmers, and many programming environments now employ these hints. However, few studies have investigated the efficacy of automated programming hints for improving performance and learning, how and when novices find these hints beneficial, and the tradeoffs that exist between different types of hints. In this work, we explored the efficacy of next-step code hints with 2 complementary features: textual explanations and self-explanation prompts. We conducted two studies in which novices completed two programming tasks in a block-based programming environment with automated hints. In Study 1, 10 undergraduate students completed 2 programming tasks with a variety of hint types, and we interviewed them to understand their perceptions of the affordances of each hint type. For Study 2, we recruited a convenience sample of participants without programming experience from Amazon Mechanical Turk. We conducted a randomized experiment comparing the effects of hints' types on learners' performance and performance on a subsequent task without hints. We found that code hints with textual explanations significantly improved immediate programming performance. However, these hints only improved performance in a subsequent post-test task with similar objectives, when they were combined with self-explanation prompts. These results provide design insights into how automatically generated code hints can be improved with textual explanations and prompts to self-explain, and provide evidence about when and how these hints can improve programming performance and learning.
Conference Paper
Automated hints, a powerful feature of many programming environments, have been shown to improve students' performance and learning. New methods for generating these hints use historical data, allowing them to scale easily to new classrooms and contexts. These scalable methods often generate next-step, code hints that suggest a single edit for the student to make to their code. However, while these code hints tell the student what to do, they do not explain why, which can make these hints hard to interpret and decrease students' trust in their helpfulness. In this work, we augmented code hints by adding adaptive, textual explanations in a block-based, novice programming environment. We evaluated their impact in two controlled studies with novice learners to investigate how our results generalize to different populations. We measured the impact of textual explanations on novices' programming performance. We also used quantitative analysis of log data, self-explanation prompts, and frequent feedback surveys to evaluate novices' understanding and perception of the hints throughout the learning process. Our results showed that novices perceived hints with explanations as significantly more relevant and interpretable than those without explanations, and were also better able to connect these hints to their code and the assignment. However, we found little difference in novices' performance. Our results suggest that explanations have the potential to make code hints more useful, but it is unclear whether this translates into better overall performance and learning.
Conference Paper
The large state space of programming problems makes providing adaptive support in intelligent tutoring systems (ITSs) difficult. Reducing the state space size could allow for more interpretable analysis of student progress as well as easier integration of data-driven support. Using data collected from a CS0 course, we present a procedure for defining a small but meaningful programming state space based on the presence or absence of features of correct solution code. We present a procedure to create these features using a panel of human experts, as well as a data-driven method to derive them automatically. We compare the expert and data-driven features , the resulting state spaces, and how students progress through them. We show that both approaches dramatically reduce the state-space compared to traditional code-states and that the data-driven features have high overlap with the expert features. We conclude by discussing how this feature-state space provides a useful platform for integrating data-driven support methods into ITSs.