Content uploaded by Paulo Blikstein
Author content
All content in this area was uploaded by Paulo Blikstein on May 02, 2014
Content may be subject to copyright.
Using learning analytics to assess students’ behavior in
open-ended programming tasks
Paulo Blikstein
Transformative Learning Technologies Lab
Stanford University School of Education and (by courtesy) Computer Science.
520 Galvez Mall, CERAS 232, Stanford, CA, 94305
paulob@stanford.edu
ABSTRACT
There is great interest in assessing student learning in unscripted,
open-ended environments, but students’ work can evolve in ways
that are too subtle or too complex to be detected by the human
eye. In this paper, I describe an automated technique to assess,
analyze and visualize students learning computer programming. I
logged hundreds of snapshots of students’ code during a
programming assignment, and I employ different quantitative
techniques to extract students’ behaviors and categorize them in
terms of programming experience. First I review the literature on
educational data mining, learning analytics, computer vision
applied to assessment, and emotion detection, discuss the
relevance of the work, and describe one case study with a group
undergraduate engineering students
Categories and Subject Descriptors
K.3.2 [Computer and Information Science Education]:
Computer Science Education.
General Terms
Algorithms, Measurement, Performance, Language.
Keywords
Learning Analytics, Educational Data Mining, Logging,
Automated Assessment, Constructionism.
1. INTRODUCTION
Researchers are unanimous to state that we need to teach the so-
called “21st century skills”: creativity, innovation, critical
thinking, problem solving, communication, and collaboration.
None of those skills are easily measured using current assessment
techniques, such as multiple choice tests, open items, or
portfolios. As a result, schools are paralyzed by the push to teach
new skills, and the lack of reliable ways to assess them. One of
the difficulties is that current assessment instruments are based on
products (an exam, a project, a portfolio), and not on processes
(the actual cognitive and intellectual development while
performing a learning activity), due to the intrinsic difficulties in
capturing detailed process data for large numbers of students.
However, new data collection, sensing, and data mining
technologies are making it possible to capture and analyze
massive amounts of data in all fields of human activity. These
techniques include logs of email and web servers, computer
activity capture, wearable cameras, wearable sensors, bio sensors
(e.g., skin conductivity, heartbeat, brain waves), and eye-tracking,
using techniques such as machine learning and text mining. Such
techniques are enabling researchers to have an unprecedented
insight into the minute-by-minute development of several
activities. In this paper, we propose that such techniques could be
used to evaluate some cognitive strategies and abilities, especially
in learning environments where the outcome is unpredictable such
as a robotics project or a computer program.
In this work, we focused on students learning to program a
computer using the NetLogo language. Hundreds of snapshots for
each student were captured, filtered, and analyzed. I will describe
some prototypical coding trajectories and discuss how they relate
to students’ programming experience, as well as the implication
for the teaching and learning of computer programming.
2. PREVIOUS WORK
Two examples of the current attempts to use artificial intelligence
techniques to assess human learning are text analysis and emotion
detection. The work of Rus et al. [13], for example, makes
extensive use of text analytics within a computer-based
application for learning about complex phenomena in science.
Students were asked to write short paragraphs about scientific
phenomena – Rus et al. then explored which machine learning
algorithm would enable them to most accurately classify each
student in terms of their content knowledge, based on
comparisons with expert-formulated responses. However, some
authors have tried to use even less intrusive technologies; for
example, speech analysis further removes the student from the
traditional assessment setting by allowing them to demonstrate
fluency in a more natural setting. Beck and Sison [4] have
demonstrated a method for using speech recognition to assess
reading proficiency in a study with elementary school students
that combines speech recognition with knowledge tracing (a form
of probabilistic monitoring.)
The second area of work is the detection of emotional states using
non-invasive techniques. Understanding student sentiment is an
important element in constructing a holistic picture of student
progress, and it also helps enabling computer-based systems to
interact with students in emotionally supportive ways. Using the
Facial Action Coding System (FACS), researchers have been able
to develop a method for recognizing student affective state by
simply observing and (manually) coding their facial expressions
and applying machine learning to the data produced [11].
Researchers have also used conversational cues to detect student’s
emotional state. Similar to the FACS study, Craig et al. designed
an application that could use spoken dialogue to recognize the
states of boredom, frustration, flow, and confusion. They were
able to resolve the validity of their findings through comparison to
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Conference’10, Month 1–2, 2010, City, State, Country.
Copyright 2010 ACM 1-58113-000-0/00/0010…$10.00.
emote-aloud activities (a derivative of talk-aloud where
participants describe their emotions as they feel them) while
students interacted with AutoTutor.
Even though researchers have been trying to use all these artificial
intelligence techniques for assessing students’ formal knowledge
and emotional states, the field is currently benefiting from three
important additions: 1) detailed, multimodal student activity data
(gestures, sketches, actions) as a primary component of analysis,
2) automation of data capture and analysis, 3) multidimensional
data collection and analysis. This work is now coalescing into the
nascent field of Learning Analytics or Educational Data Mining
[1, 3], and has been used in many contexts to measure students’
learning and affect. However, in Baker and Yacef’s review of its
current uses, the majority of the work is focused on cognitive
tutors or semi-scripted environments [2]. Open-ended tasks and
unscripted learning environments have only been in the reach of
qualitative, human-coded methods. However, qualitative
approaches presents some crucial shortcomings: (1) there is no
persistent trace of the evolution of the students’ artifacts
(computer code, robots, etc.), (2) crucial learning moments within
a project can last only seconds, and are easy to miss with
traditional data collection techniques (i.e., field notes or video
analysis), and (3) such methodologies are hard to scale for large
groups or extended periods of time. The cost of recording,
transcribing and analysis data is a known limiting factor for
qualitative researchers.
At the same time, most of previous work on EDM has been used
to assess specific and limited tasks – but the “21st century skills”
we need to assess now are much more complex, such as creativity,
the ability to find solutions to ill-structured problems and navigate
in environments with sparse information, as well as dealing with
uncertainty. Unscripted learning environments are well-known for
being challenging to measure and assess, but recent advances both
data collection and machine learning could make it possible to
understand students’ trajectories in these environments.
For example, researchers have attempted to automate the
collection of action data, such as gesture and emotion. Weinland
et al. [15] and Yilmaz et al. [17] were able to detect basic human
actions related to movement. Craig et al. [10] created a system for
automatic detection of facial expressions (the FACS study). The
technique that Craig et al. validated is a highly non-invasive
mechanism for realizing student sentiment, and can be coupled
with computer vision technology and biosensors to enable
machines to automatically detect changes in emotional state or
cognitive-affect.
Another area of active development is speech and text mining.
Researchers have combined natural language processing and
machine learning to analyze student discussions and writing,
leveraging Independent Component Analysis of student
conversations – a technique whose validity has been repeatedly
reproduced. The derived text will is subsequently analyzed using
Latent Semantic Analysis [13]. Given the right training and
language model, LSA can give a clearer picture of each student’s
knowledge development throughout the course of the learning
activity.
In the realm of exploratory learning environments, Bernardini,
Amershi and Conati [6] built student models combining
supervised and unsupervised classification, both with log files and
eye-tracking, and showed that meaningful events could be
detected with the combined data. Montalvo et al. [11], also using a
combination of automated and semi-automated real-time coding,
showed that they could identify meaningful meta-cognitive
planning processes when students were conducting experiments in
an online virtual lab environment.
However, most of these studies did not involve the creation of
completely open-ended artifacts, with almost unlimited degrees of
freedom. Even though the work around these environments is
incipient, some attempts have been made (see 7, 8). Another of
such examples is the work Berland & Martin [5], who by logging
data found that novice students' developed successful program
code by following one of two progressions: planner and tinkerer.
Planners found success by carefully structuring programs over
time, and tinkerers found success by accreting programs over
time. In their study, students were generally unsuccessful if they
didn't follow one of those paths.
In this paper, I will present one exploratory case study on the
possibility of using learning analytics and educational data mining
to inspect students’ behavior and learning in project-based,
unscripted, constructionist [12] learning environments, in which
traditional assessment methods might not capture students’
evolution. My goal is to establish a proof of existence that
automatically-generated logs of students programming can be
used to infer patterns in how students go about programming, and
that by inspecting those patterns we could design better support
materials and strategies, as well as detect critical points in the
writing of software in which human assistance would be more
needed. Since my data relies in just nine subjects, I don’t make
claims of statistical significance, but the data points present
meaningful qualitative distinctions between students.
3. METHODS AND DATASET
To collect the programming logs, I employed the NetLogo [16]
programming environment. NetLogo can log to an XML file all
users’ actions, such as key presses, button clicks, changes in
variables and, most importantly, changes in the code. I developed
techniques and custom tools to automatically store, filter, and
analyze snapshots of the code generated by students.
The logging module uses a special configuration file, which
specifies which actions are to be logged. This file was distributed
to students alongside with instruction about how to enable
logging, collect the log-files, and send those files back for analysis
Nine students in a sophomore-level engineering class had a 3-
week programming assignment. The task was to write a computer
program to model a scientific phenomenon of their choice.
Students had the assistance of a ‘programming’ teaching
assistance, following the normal class structure. The teaching
assistant was available for about 3-4 hours a week for each
student, and an individual, 1-hour programming tutorial session
was conducted with each of the students on the first week of the
study.
158 logfiles were collected. Using a combination of XQuery and
regular expression processors (such as ‘grep’), the files were
processed, parsed, and analyze (1.5GB and 18 million lines of
uncompressed text files). Below is a summary of the collected
data (in this order): total number of events logged, total number of
non-code events (e.g., variable changes, button presses), percent
of non-code events, and actual coding snapshots.
N
C
T
e
T
r
u
r
(
w
e
w
F
F
m
c
m
s
i
n
e
“
g
p
a
4
F
i
n
h
Table 1.
N
ame Eve
n
C
huc
k
258
0
Che 59
7
Leah 28
3
Liam 4044
7
Leen 2531
Luca 926
3
Nema 36
9
Paul 21
8
Shana 4165
6
Total 8826
8
T
he overwhelmi
n
e
vents, such as
T
hese
p
articula
r
u
nning or testi
n
ecorded, what
almost 9 milli
o
w
ith models is
e
vents were filt
e
w
ith 1187 event
s
F
or further data
F
irst, I develo
p
m
eaningful ev
e
c
haracters, key
w
m
essages. Then
,
s
napshots where
n
flection
p
oints
e
xamine the s
n
“
Event Navigato
g
o back and fo
r
p
rogression and
m
Figure 1. Scre
e
a
llows researc
h
stu
d
4
. DATA
A
F
or the analysis
,
n
-depth explora
t
h
er work with
Number of ev
e
n
ts Non-co
d
0
36 25767
5
7
0 928
3
6 525
7
23 404112
3
12 24182
7
3
1 86708
9
0 649
8
15
6
57 415932
7
8
73 878877
7
n
g majority of
e
variable chan
g
r
kin
d
s of eve
n
n
g models – e
v
accounts for t
h
o
n.) Since the
a
out of the sco
p
e
red out from t
h
s
for 9 users.
analysis, a co
m
p
e
d
a series o
f
nts within th
e
w
ords used, cod
e
,
I used the res
seemingly aty
p
, plateaus, and
n
apshots, I dev
e
r
” (Figure 1). T
h
r
th in time, “fr
a
m
easuring statis
e
nshot of the E
v
h
ers to go back
d
ents created a
A
NALYSIS
I will first foc
u
t
ion of her codi
n
other students
,
e
nts collected p
e
d
e Non-Cod
e
5
99.9%
15.5%
18.5%
3
99.9%
7
95.5%
93.6%
17.6%
6.9%
7
99.8%
7
99.6%
e
vents collecte
d
g
es, buttons pr
e
n
t takes place
w
v
ery single var
i
h
e very large
n
a
nalysis of stu
d
p
e of this pap
e
h
e main datase
t
m
bination of tec
h
f
Mathematica
e
dataset, suc
h
e
compilations,
a
ulting plots to
p
ical coding act
i
sharp decrease
s
e
loped a custo
h
e software ena
b
me-by-frame,”
t
tical data.
v
ent Navigator
a
nd forth in ti
m
computer pro
g
u
s on one stude
n
n
g strategies. Th
,
and show h
o
e
r student
e
% Code
361
5042
2311
3600
11285
5923
3041
203
6330
38096
d
were non-codi
n
e
ssed, and clic
k
w
hen students
a
i
able change g
e
n
umber of eve
n
d
ents’ interactio
e
r, all non-codi
n
t
, so we were l
e
h
niques was us
e
scripts to co
u
h
as number
a
nd types of er
r
look at particu
l
i
vity took place
s
or increases.
T
m software to
o
b
les researchers
t
racking studen
t
software, whic
h
m
e, tracking ho
w
g
ram.
n
t and conduct
a
e
n, I will comp
a
o
w differences
n
g
k
s.
a
re
e
ts
n
ts
ns
n
g
e
ft
e
d.
u
nt
of
r
or
l
ar
–
T
o
o
l,
to
t
s’
h
w
a
n
a
re
in
p
revio
u
p
erfor
m
4.1
C
4.1.1
Luca i
s
model
with c
o
averag
e
of her
l
Figure
contin
u
code, t
h
time b
right),
compil
a
unsucc
e
series
p
urpos
e
6 regi
o
overall
Code
N
when t
h
Figu
r
The fo
l
1. L
u
p
r
o
th
a
u
n
sk
e
dr
o
2. S
h
he
r
b
e
t
u
n
or
a
ch
u
s ability and
m
ance.
C
odin
g
str
a
Luca
s a sophomore
in her domain
o
mputers, and
h
e
, which makes
h
l
og files.
2 is a visualiz
u
ous (red) curv
e
h
e (blue) dots (
m
etween two co
(green) dots
ations, (orang
e
essful compilat
i
are arbitrary
e
s.) In the follo
w
o
ns of the plot.
increase in ch
a
N
avigator tool (
h
e events happe
n
r
e 2. Code size,
l
lowing are the
m
u
ca started with
o
grams seen in
a
n a minute,
n
necessary code
e
leton of a new
o
p in point A).
h
e spent the ne
x
r first procedu
r
tween A and
B
n
successful co
m
a
nge dots), and
g
h
aracters of code.
experience mi
g
a
te
g
ies
engineering st
u
area. She had
m
h
er grade in th
e
her a good exa
m
z
ation of Luca’
s
e
represents the
n
mostly) undern
e
de compilation
s
placed at y=
1
e
) dots plac
e
i
ons (the y coo
r
and were cho
w
ing paragraph
s
The analysis
w
a
racter count (F
i
Figure 1) to lo
c
n
ed.
time between
c
for Luca’s log
m
ain coding ev
e
one of the exe
m
the tutorial. I
n
she deleted
and ended up
w
program (see t
h
x
t half-hour bu
i
r
e. During this
B
, she had num
e
m
pilations (see
g
oes fro
m
200 t
o
.
g
ht have deter
m
u
dent and built
m
odest previou
s
e
class was als
o
m
ple for an in-d
e
s
model buildin
n
umber of char
a
e
ath the curve r
s
(secondary y
-
1
800 represent
e
d at y=120
0
r
dinates for th
o
sen just for
v
s
, I will analyz
e
w
as done by lo
o
i
gure 2), and th
e
c
ate the exact p
c
ompilations, a
n
files
e
nts for Luca:
m
pla
r
n
less
the
w
ith
a
h
e big
i
lding
time,
e
rous
the
o
600
m
ined their
a scientific
s
experience
o
around the
e
pth analysis
n
g logs. The
a
cters in her
epresent the
-
axis to the
successful
0
represent
o
se two data
v
isualization
e
each of the
o
king at the
e
n using the
oint in time
n
d errors,
e
4
U
l
o
f
L
(
i
n
a
m
3. The size of
12 minutes
sudden ju
m
characters (
j
jump corres
p
p
asting her
her first pr
second one.
she opens
programs w
i
4. Luca s
p
en
d
new duplic
a
frequency o
f
the density
the averag
e
increases, a
n
for about on
e
5. After one h
o
another sud
d
from 900 t
o
D and E).
A
to open a s
a
p
rocedure t
h
needed for
h
compilation
s
6. After mak
i
work, Luca
1200 charac
t
about 20
m
code, fixin
g
names of
changes in
there are no
Luca’s narra
t
e
vents:
Str
i
p
oi
n
Lo
n
stu
d
use
f
Su
d
im
p
co
d
A f
i
the
4
.1.2 Shana,
U
sing the chara
o
gfiles from o
t
f
ollowing, I sho
w
L
een, and Shan
a
including openi
n
n
Figure 2 did
a
ctivities withi
n
m
anipulating ot
h
t
he code remai
n
(point B), unti
l
m
p from 60
0
j
ust before poi
n
p
onds to Luca c
own code: she
ocedure as a
b
During this p
e
many of t
h
i
thin NetLogo.
d
s some time
m
a
te
p
rocedure
w
f
compilation de
c
of orange and
g
e
time per
c
n
d again we se
e
e
hour (point D)
.
o
ur in the
p
late
a
d
en increase in
o
1300 characte
r
A
ctually, what L
u
a
mple program
a
h
at generated so
m
h
er model. Not
e
s
are even less fr
e
i
ng the “recy
c
got to her final
t
ers of code. Sh
e
m
inutes “beaut
i
g
the indentatio
n
variables, etc.
the code took
incorrect compi
l
t
ive suggests,
t
i
pping down a
n
n
t.
n
g plateaus of
n
d
ents browse o
t
f
ul pieces.
d
den jumps in
p
ort code from
o
d
e from within t
h
i
nal phase in w
h
co
d
e, indentati
o
Lian, Leen,
a
c
te
r
count time
t
her students i
n
w
plots from f
o
a
, Figure 3),
w
n
g other models
—
not show all
o
n
her model,
h
er models).
n
s stable fo
r
l
there is
a
0
to 900
n
t C). This
opying an
d
duplicate
d
b
asis for
a
e
riod, also,
h
e sample
m
aking he
r
w
ork. The
c
reases (see
g
reen dots),
c
ompilatio
n
e
a platea
u
.
a
u, there is
code size,
r
s (betwee
n
u
ca did was
a
nd copy
a
m
ething she
e
that code
e
quent.
c
led” code
number o
f
e
then spen
t
i
fying” the
n
, changing
No real
place, an
d
l
ations.
t
hus, four prot
o
n
existing
p
ro
g
n
o coding activ
t
her code (or t
h
character cou
n
o
ther programs,
h
eir working
p
ro
g
h
ich students fi
x
o
n, variable nam
e
a
nd Che
series it is
p
o
s
n
search of si
m
o
u
r
different stu
d
hich include al
l
—
the “spikes”
—
o
f Luca’s activi
t
i.e., excludi
n
o
typical modeli
n
g
ram as a starti
n
ity, during whi
c
h
eir own code)
f
n
t, when stude
n
or copy and pa
s
g
ram.
x
the formatting
e
s, etc.
s
sible to exami
n
m
ilarities. In t
h
d
ents (Luca, C
h
l
of their activ
i
—
note that the p
l
t
ies, but only
h
n
g opening a
n
n
g
n
g
c
h
f
or
n
ts
s
te
of
n
e
h
e
h
e,
i
ty
l
ot
h
er
n
d
Figur
e
Che,
First,
l
almost
jump (
a
closer,
differe
n
the co
d
and de
c
and a
d
sample
hers o
n
the m
o
make
integra
t
signifi
c
Leen,
o
other s
1000
2000
3000
4000
5000
6000
1000
2000
3000
4000
1500
2000
2500
3000
3500
1000
2000
3000
4000
e
3. Code size v
e
and Leen. The
o
l
et’s examine
S
no change in th
e
a
t time=75) fro
m
systematic ex
a
n
t approach tha
n
d
e of other
p
rog
r
c
ided to do the
d
d her code to
i
e
program (prov
i
n
top of it. The
o
ment when she
it ‘her own’
ted the pre-e
x
c
ant new feature
o
n the other han
d
s
ample
p
rogram
s
20 40 60
sh
a
20 40 60
luc
a
50 10
0
c
100 200
e
rsus time for
f
spikes show m
o
o
pened sample
S
hana’s logfile
s
e baseline char
a
m
about 200 to
4
a
mination revea
l
n
Luca. After s
o
r
ams into her o
w
opposite: start
f
i
t. She then ch
o
i
ded as part of
t
sudden jump t
o
loaded the sa
m
by adding
pr
x
isting code i
n
e
s.
d
, had yet anoth
e
s
for inspiratio
n
80 100 1
2
a
na
80 100 1
a
0
150
c
h
e
300 400
lee
n
f
or students: S
h
o
ments in whic
h
code.
s
. After many
a
cte
r
count, ther
e
4
,000 character
s
l
ed that Shana
o
me attempts to
w
n (the spikes),
f
rom a ready-m
a
o
se a very wel
l
t
he initial tutori
a
o
4,000 charact
e
m
ple program a
n
r
ocedures. She
n
to her new
o
e
r coding style.
H
n
or cues, but
d
2
0
20
200
500
h
ana, Luca,
h
students
spikes and
e
is a sudden
s
of code. A
employed a
incorporate
she gave up
a
de program
l
-established
a
l) and built
e
rs indicates
n
d started to
seamlessly
o
ne, adding
H
e did open
d
id not copy
and paste code. Instead, he built his procedures in small
increments by trial-and-error. In Table 2 we can observe how he
coded a procedure to “sprout” a variable number of white screen
elements (the action lasted 30-minutes). The changes in the code
(“diffs”) are indicated with the (red) greyed-out code.
Table 2. Leen’s attempts to write a procedure
to Insert-
Vacancies
to Insert-Vacancies
sprout 2
[ set breed
vacancies
set color
white ] ]
end
Initial code
to Insert-
Vacancies
ask patches
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
to Insert-Vacancies
ask one-of
patches
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
Ask patches is
introduced, and
then one-of
to Insert-
Vacancies
ask one-of
patches
[ sprout 1
[ set breed
vacancies
set color
white ] ]
end
to Insert-Vacancies
ask 5 patches
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
Leen
experiments
with different
number of
patches (1, 5,
3)
to Insert-
Vacancies
ask patches-
from
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
to Insert-Vacancies
loop
[
ask one-of
patches
[ sprout 2
[ set breed
vacancies
set color
white ] ] ]
end
Tries patches-
from and then
introduce a
loop
to Insert-
Vacancies
while
[
ask one-of
patches
[ sprout 2
[ set breed
vacancies
set color
white] ] ]
end
to Insert-Vacancies
while n < Number-
of-Vacancies
[
ask one-of
patches
[ sprout 2
[ set breed
vacancies
set color
white ] ] ]
end
Tries another
loop approach,
with a while
command
to Insert-
Vacancies
ask one-of
patches
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
to Insert-Vacancies
ask 35 of patches
[ sprout 2
[ set breed
vacancies
set color
white ] ]
end
Gives up
looping, tries a
fixed number
of patches
to Insert-
Vacancies
ask n-of
Number-of-
Vacancies patches
[ sprout 2
[ set breed
vacancies
set color
white ] ] end
Gives up a
fixed number,
creates a slider,
and introduces
n-of
Leen trial-and-error method had an underlying pattern: he went
from simpler to more complex structures. For example, he first
attempts a fixed, “hardcoded” number of events (using the sprout
command), then introduces control structures (loop, while) to
generate a variable number of events, and finally introduces new
interface widgets to give the user control over the number of
events. Leen reported having a high familiarity with programming
languages (compared to Luca and Shana), which might explain his
different coding style. He seemed to be much more confident
generating code from scratch instead of opening other sample
programs to get inspiration or import code.
Che, with few exceptions, did not open other models during
model building. Similarly to Leen, he also employs an
incremental, trial-and-error approach, but we can clearly detect
many more long plateaus in his graph. Therefore, based on these
logfiles, seven canonical coding strategies can be inferred:
1. Stripping down an existing program as a starting point.
2. Starting from a ready-made program and adding one’s own
procedures.
3. Long plateaus of no coding activity, during which students
browse other sample programs (or their own) for useful code.
4. Long plateaus of no coding activity, during which students
think of solutions without browsing other programs.
5. Period of linear growth in the code size, during which
students employ a trial-and-error strategy to get the code right.
6. Sudden jumps in character count, when students import code
from other programs, or copy and paste code from within their
working program.
7. A final phase in which students fix the formatting of the
code, indentation, variable names, etc.
Based on those strategies, and the previous programming
knowledge of students determined from questionnaires, the data
suggest three coding profiles:
“Copy and pasters:” more frequent use of a, b, c, f, and g.
Mixed-mode: a combination of c, d, e, and g.
“Self-sufficients:” more frequent use of d, e.
The empirical verification of these canonical coding strategies and
coding profiles has important implications for design, in
particular, learning environments in which engage in project-
based learning. Each coding strategy and profile might demand
different support strategies. For example, students with more
advanced programming skills (many of which exhibited the “self-
sufficient” behavior) might require detailed and easy-to-find
language documentation, whereas “copy and pasters” need more
working examples with transportable code. In fact, it could be that
more expert programmers find it enjoyable to figure the solutions
themselves, and would dislike to be helped when they are
problem-solving. Novices, on the other hand, might welcome
some help, since they exhibited a much more active help-seeking
behavior. The data suggests that students in fact are relatively
autonomous in developing apt strategies for their own expertise
level, and remained consistent. Therefore, echoing previous work
on epistemological pluralism, the data suggests that it would be
beneficial for designers to design multiple forms of support to
cater to each style (see, for example, [14]).
4.1.3 Code compilation
Despite these differences, one behavior seemed to be rather
similar across students: the frequency of code compilation. Figure
4
t
h
n
p
d
t
h
4
shows the mo
v
h
e error rate) v
e
n
umber of unsu
c
p
eriod (the mo
v
d
uration of the
l
h
ere period of t
h
v
ing average o
f
e
rsus time, i.e., t
h
c
cessful compil
a
v
ing average
p
l
ogfile—if ther
e
h
e moving aver
a
f
unsuccessful c
o
h
e higher the va
a
tions within on
p
eriod was 10
%
e
were 600 co
m
a
ge would be 60
)
o
mpilations (th
u
lue, the higher t
h
e moving avera
g
%
of the over
a
m
pilation attemp
t
)
.
u
s,
h
e
g
e
a
ll
t
s,
Fi
g
For all
instant
s
starts
v
then d
e
on to
p
compil
a
them
a
p
prox
i
further
which
genera
t
compri
compil
a
explor
a
follow
e
compil
a
smalle
r
5. 4
This
p
(compi
freque
n
serve
a
into st
u
enviro
n
The fr
e
p
lots
p
approx
i
an ana
l
based l
First, t
o
difficu
l
data i
n
p
roject
deadli
n
p
ropos
e
assign
m
could
m
system
might
compil
a
atypic
a
changi
n
Secon
d
cater t
o
g
ure 4. Error r
a
four students,
a
s
, the error rate
c
v
ery low, reach
e
e
creases reachin
g
p
of y=0 (c
o
ations) indicate
are concentrat
e
imately 2/3 in t
h
confirms the
d
we observed t
h
t
ing code is not
i
sed of severa
l
ations, we can
d
a
tion character
i
e
d by a phase
ation attempts,
r
fixes, with a lo
CONCL
U
p
aper is an
i
i
lation frequen
c
n
cy of correct/i
n
a
s formative as
s
u
dents’ free-for
m
n
ments.
e
quency of cod
e
p
reviously ana
l
imation of each
l
ysis has impor
t
l
earning environ
m
o
design and all
o
l
ty in the
p
rogr
n
dicate that tho
s
and not towar
d
n
e crunch anec
d
e
d metrics ca
n
m
ent and not o
n
m
onitor student
s
indicates that
s
be detected
ations occur w
i
a
l error rate c
u
n
g in size too m
u
d
, support mate
r
o
diverse codi
n
a
te versus com
p
a
fter we elimin
a
curve follows a
n
e
s a peak halfw
g
values close t
o
o
rrect compilat
i
the actual co
m
e
d in the fir
s
t
he first half to
d
ata from the
p
h
at the process
o
homogenous a
n
l
different pha
s
d
istinguish three
i
zed by few
u
with intense
and a final p
wer error rate.
U
SION
i
nitial step to
w
c
y, code size
,
n
correct compil
a
s
essments tools
,
m
explorations
e
compilations,
l
yzed, enables
prototypical co
t
ant implication
ments.
o
cate support re
s
r
amming
p
roces
s
s
e moments ha
p
d
s the end, as I i
n
dotally reporte
d
n
be calculate
d
n
ly at the end, s
o
s
in real time a
n
s
tudents are in
a
when, for e
x
i
th few change
s
u
rve is identif
i
u
ch for a long p
e
r
ials and st
r
ate
g
n
g styles and
p
p
ilation attemp
t
a
te the somewh
a
n
inverse parab
o
ay through the
o
zero. Also, th
e
i
ons) and y=
1
m
pilation attem
p
s
t half of th
e
1/3 in the seco
n
p
revious logfile
o
f learning to
p
n
d simple, but
c
s
es. In the ca
distinct segme
n
u
nsuccessful c
o
code evolution
hase of final
t
w
ards develop
i
,
code evoluti
a
tions, etc.) tha
t
,
and pattern-fi
n
in technology-r
together with t
h
us to trace a
ding profile an
d
s for the desig
n
s
ources, mome
n
s should be id
e
p
pens mid-way
n
itially suspecte
d
by many st
u
d
during the p
r
o
instructors an
d
n
d offer help o
n
a
critical zone.
T
x
ample, sever
a
s
in character
c
i
ed, or when
t
e
riod of time.
g
ies need to be
p
rofiles. A “sel
t
s (time)
a
t noisy first
o
lic shape. It
project, and
e
(blue) dots
1
(incorrect
p
ts. Most of
e
activity—
n
d half. This
analysis, in
p
rogram and
c
omplex and
se of code
n
ts: an initial
o
mpilations,
and many
t
ouches and
i
ng metrics
i
on pattern,
t
could both
n
ding lenses
ich learning
h
e code size
reasonable
d
style. Such
n
of project-
n
ts of greater
e
ntified. The
through the
d (given the
u
dents). The
r
ogramming
d
facilitators
n
ly when the
T
hese zones
a
l incorrect
c
ount, or an
t
he code is
designed to
f-sufficient”
coder might not need too many examples, but will appreciate
good command reference. Similarly, novices might benefit more
from well-documented, easy to find examples with easy-to-adapt
code.
By better understanding each student’s coding style and behavior;
we also have an additional window into students’ cognition.
Paired with other data sources (interviews, tests, surveys), the data
could offer a rich portrait of the programming process and how it
affects students’ understanding of the programming language and
more sophisticated skills such as problem solving.
However, the implications of this class of technique are not
limited to programming. Granted, programming offers a relatively
reliable way to collect ‘project snapshots,’ even several times per
hour. But such approaches could be employed with educational
software, or even with tangible interfaces, with the right computer
vision toolkit.
6. LIMITATIONS AND FUTURE WORK
Due to the low number of participants, the current study does not
make any claims about statistical significance. Also, because of
the length of the assignment (3 weeks), some students lost part of
their log files and their data could not be considered. For future
studies, we will be using a centralized repository that would avoid
the local storage of the log files, increasing their reliability and
reduce lost data. Another limitation is that I do not log what
students do outside of the programming environment, so I might
mistakenly take a large thinking period with a pause.
7. REFERENCES
[1] Amershi, S., & Conati, C. (2009). Combining Unsupervised
and Supervised Classification to Build User Models for
Exploratory Learning Environments. Journal of Educational
Data Mining, 1(1), 18-71.
[2] Baker, R. & Yacef, K. (2009). The State of Educational Data
Mining in 2009: A Review and Future Visions. Journal of
Educational Data Mining, 1(1).
[3] Baker, R. S., Corbett, A. T., Koedinger, K. R., & Wagner, A.
Z. (2004). Off-task behavior in the cognitive tutor classroom:
when students “game the system”. Paper presented at the
Proceedings of the SIGCHI conference on Human factors in
computing systems.
[4] Beck, J., & Sison, J. (2006). Using knowledge tracing in a
noisy environment to measure student reading proficiencies.
International Journal of Artificial Intelligence in Education,
16(2), 129-143.
[5] Berland, M. & Martin, T. (2011). Clusters and Patterns of
Novice Programmers. The meeting of the American
Educational Research Association. New Orleans, LA.
[6] Bernardini, A., & Conati, C. (2010). Discovering and
Recognizing Student Interaction Patterns in Exploratory
Learning Environments. In V. Aleven, J. Kay & J. Mostow
(Eds.), Intelligent Tutoring Systems (Vol. 6094, pp. 125-
134): Springer Berlin / Heidelberg.
[7] Blikstein, P. (2009). An Atom is Known by the Company it
Keeps: Content, Representation and Pedagogy Within the
Epistemic Revolution of the Complexity Sciences. Ph.D.
PhD. dissertation, Northwestern University, Evanston, IL.
[8] Blikstein, P. (2010). Data Mining Automated Logs of
Students' Interactions with a Programming Environment: A
New Methodological Tool for the Assessment of
Constructionist Learning. American Educational Research
Association Annual Conference (AERA 2010), Denver, CO.
[9] Conati, C., & Merten, C. (2007). Eye-tracking for user
modeling in exploratory learning environments: An empirical
evaluation. Knowledge-Based Systems, 20(6), 557-574.
[10] Craig, S. D., D'Mello,S., Witherspoon, A. and Graesser, A.
(2008). 'Emote aloud during learning with AutoTutor:
Applying the Facial Action Coding System to cognitive-
affective states during learning', Cognition & Emotion, 22: 5,
777 — 788.
[11] Montalvo, O., Baker, R., Sao Pedro, M., Nakama, A., &
Gobert, J. (2010) Identifying Students’ Inquiry Planning
Using Machine Learning. Educational Data Mining
Conference, Pittsburgh, PA.
[12] Papert, S. (1980). Mindstorms : children, computers, and
powerful ideas. New York: Basic Books.
[13] Rus, V., Lintean, M. and Azevedo,R. (2009). Automatic
Detection of Student Mental Models During Prior
Knowledge Activation in MetaTutor. In Proceedings of the
2nd International Conference on Educational Data Mining
(Jul. 1-3, 2009). Pages 161-170.
[14] Turkle, S., & Papert, S. (1990). Epistemological Pluralism.
Signs, 16, 128-157.
[15] Weinland, D., Ronfard, R., and Boyer, E. (2006). Free
viewpoint action recognition using motion history volumes.
Comput. Vis. Image Underst. 104, 2 (Nov. 2006), 249-257
[16] Wilensky, U. (1999, updated 2006). NetLogo [Computer
software]. Evanston, IL: Center for Connected Learning and
Computer-Based Modeling.
[17] Yilmaz, A. and Shah, M. (2005). Recognizing Human
Actions in Videos Acquired by Uncalibrated Moving
Cameras. In Proceedings of the Tenth IEEE international
Conference on Computer Vision (ICCV ‘05) Volume 1 -
Volume 01 (October 17 - 20, 2005). ICCV. IEEE Computer
Society, Washington, DC, 150-157.