Conference PaperPDF Available

Smart Like a Fox: How Clever Students Trick Dumb Automated Programming Assignment Assessment Systems

Authors:
  • Lübeck University of Applied Sciences

Abstract and Figures

This case study reports on two first-semester programming courses with more than 190 students. Both courses made use of automated assessments. We observed how students trick these systems by analysing the version history of suspect submissions. By analysing more than 3300 submissions, we revealed four astonishingly simple tricks (overfitting, evasion) and cheat-patterns (redirection, and injection) that students used to trick automated programming assignment assessment systems (APAAS). Although not the main focus of this study, it discusses and proposes corresponding countermeasures where appropriate. Nevertheless, the primary intent of this paper is to raise problem awareness and to identify and systematise observable problem patterns in a more formal approach. The identified immaturity of existing APAAS solutions might have implications for courses that rely deeply on automation like MOOCs. Therefore, we conclude to look at APAAS solutions much more from a security point of view (code injection). Moreover, we identify the need to evolve existing unit testing frameworks into more evaluation-oriented teaching solutions that provide better trick and cheat detection capabilities and differentiated grading support.
Content may be subject to copyright.
Smart Like a Fox: How Clever Students Trick Dumb Automated
Programming Assignment Assessment Systems
Nane Kratzke a
L¨
ubeck University of Applied Sciences, M¨
onkhofer Weg 239, 23562 L¨
ubeck, Germany
nane.kratzke@th-luebeck.de
Keywords: Automated, Programming, Assignment, Assessment, Education, MOOC, Code injection, Moodle, VPL.
Abstract: This case study reports on two first-semester programming courses with more than 190 students. Both courses
made use of automated assessments. We observed how students trick these systems by analysing the version
history of suspect submissions. By analysing more than 3300 submissions, we revealed four astonishingly
simple tricks (overfitting, evasion) and cheat-patterns (redirection, and injection) that students used to trick
automated programming assignment assessment systems (APAAS). Although not the main focus of this study,
it discusses and proposes corresponding counter-measures where appropriate. Nevertheless, the primary intent
of this paper is to raise problem awareness and to identify and systematise observable problem patterns in a
more formal approach. The identified immaturity of existing APAAS solutions might have implications for
courses that rely deeply on automation like MOOCs. Therefore, we conclude to look at APAAS solutions
much more from a security point of view (code injection). Moreover, we identify the need to evolve existing
unit testing frameworks into more evaluation-oriented teaching solutions that provide better trick and cheat
detection capabilities and differentiated grading support.
1 INTRODUCTION
We are at a transition point between the industrialisa-
tion age and the digitisation age. Computer science
related skills are a vital asset in this context.
One of these basic skills is practical programming.
Consequently, the course sizes of university and col-
lege programming courses are steadily increasing.
Even massive open online courses (Pomerol et al.,
2015) – or MOOC – are used more and more sys-
tematically to convey necessary programming capa-
bilities to students of different disciplines (Staubitz
et al., 2015). The coursework consists of program-
ming assignments that need to be assessed. Since the
submitted assignments are executable programs with
a formal structure, they are highly suited to be as-
sessed automatically. In consequence, plenty of auto-
mated programming assignment assessment systems
(APAAS) evolved and showed that “automatic ma-
chine assessment better prepares students for situa-
tions where they have to write code by themselves
by eliminating reliance on external sources of help
(Maguire et al., 2017). We refer to (Romli et al.,
2017), (Caiza and Alamo Ramiro, 2013), (Ihantola
ahttps://orcid.org/0000-0001-5130-4969
et al., 2010), (Douce et al., 2005), and (Ala-Mutka,
2005) for an overview of such tools.
In plain terms, APAAS solutions are systems that
execute injected code (student submissions). The
problem is that code injection is known as a severe
threat from a security point of view. Code injection
is the exploitation of a runtime environment that pro-
cesses data from potentially untrusted sources (Su and
Wassermann, 2006). Injection is used by an attacker
to introduce code into a vulnerable runtime environ-
ment to change the course of intended execution. We
refer to (Halfond and Orso, 2005), (Ray and Ligatti,
2012), and (Gupta and Gupta, 2017) for an overview
of such kind of attacks.
Of course, such code injection vulnerabilities are
considered by APAAS solutions. Let us take the
Virtual Programming Lab (VPL) (Rodr´
ıguez et al.,
2011b) as an example. VPL makes use of a jail system
that is built around a Linux daemon to execute sub-
missions in a controlled environment. Student sub-
missions are executed in a sandbox that limits the
available resources and protects the host. This sand-
box strategy is common for almost all APAAS solu-
tions.
However, it is astonishing that APAAS solutions
like VPL overlook the cheating cleverness of students.
On the one hand, APAAS solutions protect the host
system via sandbox mechanisms, and APAAS sys-
tems put much effort in sophisticated plagiarism de-
tection and authorship control of student submissions
(Rodr´
ıguez et al., 2011a), (del Pino et al., ).
However, the grading component can be cheated
by much more straightforward means in various
ways which makes these solutions highly suspect
for (semi-)automated programming examinations that
contribute to certificate a certain level of program-
ming expertise like it must be done in MOOCs. Al-
though this paper provides some solutions to curtail
the problem, the primary intent of this paper is a more
descriptional and analytical one. Many evaluation
solutions in this domain space are often just “hand
made”. This paper intends to raise problem aware-
ness and to identify and systematise observable prob-
lem patterns in a more formal approach. According to
a literature review and the best of the author’s knowl-
edge, no similar studies and comparative evaluations
exist. Nevertheless, cheat and trick patterns are (un-
consciously) known by many programming educators
but have been systematised rarely – especially not in
the context of automatic evaluations.
Let us take Moodle as an example. Moodle is
an Open Source E-Learning system adopted by many
universities and colleges. If an institution runs Moo-
dle, a reasonable choice for an APAAS solution is the
Virtual Programming Lab (VPL) because of its con-
venient Moodle integration. However, one thing we
learned throughout this study is, how astonishing sim-
ple it is to trick VPL. Let us take Listing 1 as a very
illustrating example.
Listing 1: Point injection attack.
System.out .println ("Grade :=>> 100");
System.exit(0);
If placed correctly in a student submission, these
both lines will grade every VPL submission with full
points. This “attack” will be classified in this paper
as an “injection attack” and can be easily adapted to
every other programming language and APAAS so-
lution. Three more patterns are explained that have
been observed throughout two programming courses.
Therefore, the remainder of this paper is outlined
as follows. We will present our methodology to iden-
tify student cheat-patterns in Section 2. Section 3
will present observed cheat-patterns and derives sev-
eral insights on what automated evaluation tools could
consider preventing this kind of cheating. Section 4
will address threats of internal and external validity of
this study and provide some guidelines on the gener-
alizability and limitations of this paper. We conclude
our findings in Section 5 and present some fruitful re-
search opportunities.
2 METHODOLOGY
We evaluated two first semester programming Java
courses in the winter semester 2018/19:
• A regular computer science study programme
(CS)
• An information technology and design focused
study programme (ITD)
Both courses were used to search for student sub-
missions that intentionally trick the grading compo-
nent. However, plagiarism is excluded in this study
because a lot of other studies have already anal-
ysed plagiarism detection (Liu et al., 2006), (Burrows
et al., 2007), (Rodr´
ıguez et al., 2011a), (del Pino et al.,
). Table 1 provides a summarized overview of the
course design and Figure 1, and Figure 2 present the
corresponding student activities, and results.
Table 1: Course overview.
CS ITD
Students 113 79
Assignments (total) 29 20
Number of bunches 11 6
Assignments per bunch (avg) 3 3
Time for a bunch (weeks) 1 2
Groups 7 6
Students per group (avg) 18 12
Student/advisor ratio (avg) 6 6
All assignments were automatically evaluated by
the VPL Moodle plugin (version 3.3.3). We followed
the general recommendations described by (Thi´
ebaut,
2015). Additionally, we developed a VPL Java tem-
plate to minimise repetitive programming of the eval-
uation logic. The template was used to check the
conformance with assignment-intended programming
styles like recursive or functional (lambda-based) pro-
gramming, avoidance of global variables and so on.
To minimise Hawthorne and Experimenter effects we
only applied minor changes to this template through-
out the study (see Section 4).
The assignments were organised as weekly
bunches but all workable from the very beginning. To
foster continuous working weekly deadlines were set
for the bunches. In case of ITD the “base speed” was
a bit slower (two weeks per bunch). Except for this
“base speed,” all students could work on their sub-
missions in their own pace in the classroom, at home,
on their own, in groups or any other form they found
personally appropriate.
Figure 1: Students interactions with VPL. Circle size is rel-
ative to the observed maximum of interactions. Times of
weekly programming laboratories are marked.
All students joined one time a week a program-
ming laboratory session of 90 minutes with the op-
portunity to ask advisors and student tutors for help
and support. To solve more than half of all assign-
ments qualified to take part in a special VPL-test to
get a 10%-bonus for the written exam. However, this
study does not cover the “bonus-”test nor the written
exam.
2.1 Illustrating Assignment
Some basic Java programming knowledge must be as-
sumed throughout this paper. The continuous exam-
ple assignment for this paper shall be the following
task1. A method countChar() has to be programmed
that counts the occurrence of a specific character cin
a given String s(not case-sensitive).
The following example calls are provided for a
better understanding of the intended functionality.
countChar('a', "Abc") !1
countChar('A', "abc") !1
countChar('x', "ABC") !0
countChar('!', "!!!") !3
A reference solution for our “count chars in a
1A catalogue of all assignments (German) used for this
study can be found here: https://bit.ly/2WSbtEm
Figure 2: Observed submission and workload trends (last
CS submission in calendar week 03 was not mandatory).
string” problem might be the following implementa-
tion of countChar().
Listing 2: Reference solution (continuous example).
int countChar(char c, String s) {
s=s.toLowerCase();
c=Character.toLowerCase(c);
int i=0;
for (char x:s.toCharArray()){
if (x == c) i++;
}
return i;
}
Figure 3 shows an exemplifying VPL screenshot
from a students perspective.
2.2 Searching for Cheats
Our problem awareness emerged with the (acciden-
tal) observation of Listing 3 in an intermediate student
submission.
Listing 3: (Accidental) observed student inspection code.
System.out .println (
System. getProperties()
);
In Java, the getProperties() method reveals the
current set of underlying system properties. It con-
Figure 3: VPL screenshot (on the right evaluation report presented to students).
tains information like the Java installation directory,
the Java version, the Java classpath, the operating sys-
tem, and the user account. This line of code con-
tributed absolutely nothing to solve the respective as-
signment. It did not even occur in the final submis-
sion. It was just an intermediate outcome to inspect
the underlying configuration of the electronic evalua-
tion component. Although the revealed data is harm-
less for the integrity of the jails server, this one line
of code shows that students understand how to make
use of such “code injection attacks” to figure out how
the electronic evaluation component could be com-
promised.
To minimise Hawthorne and Experimenter effects
(Campbell and Stanley, 2003) neither the students
nor the advisers in the practical programming courses
were aware that student submissions were analysed
to deduce cheating patterns. Even if cheating was de-
tected this had no consequences for the students. It
was not even communicated to the student or the ad-
visers.
Furthermore, students were not aware that the ver-
sion history of their submissions and therefore even
intermediate cheating experiments (that did not make
it to the final submission) were logged.
However, not every submission was inspected for
understandable effort reasons. Therefore, the follow-
ing submission samples were investigated to search
systematically for cheat-patterns in every calendar
week.
S1: TOP 10 of submissions with great many trig-
gered evaluations (parameter optimization?)
S2: TOP 10 of submission with a great many ver-
sions (cheating experiments?)
S3: TOP 10 of submissions with astonishingly
less average points across all triggered evaluations
but full points in the final submission (what causes
such “boosts”?)
S4: Submissions with unusual (above 95% per-
centile) many condition related terms like if,
return,switch,case,&&, and || (parameter op-
timization?)
S5: Submissions with unusual terms like
System.exit,throw,System.getProperties,
:=>> that would stop program execution or have a
special meaning in VPL or Java but are unlikely to
contribute to a problem solution (APAAS attack?)
S6: Ten further random submissions to cover un-
intended observation aspects.
Table 2 summarizes the results quantitatively.
Within these six samples, cheat and trick patterns
were identified mainly by a manual but a script-
supported observation. VPL submissions were down-
loaded from Moodle and analysed weekly. We de-
veloped a Jupyter-based (Kluyver et al., 2016) quan-
titative analysis and submission data model for this
dataset. Each student submission was represented
as an object containing its version and grading his-
tory that references its student submitter and its corre-
sponding study programme. The analytical script and
data model made use of the well known Python li-
braries statistics,NumPy (Oliphant, 2006), matplotlib
(Hunter, 2007), and the Javaparser library (Smith
et al., 2018). It was used to identify the number of
submissions and evaluations, points per submission
versions, timestamps per submission version, occur-
rences of unusual terms, and so on. Based on this
quantitative data, the mentioned samples (S1 - S5)
were selected automatically and randomly in case of
S6. The script was additionally used to generate Fig-
Table 2: Detected cheats.
Week
Assignments (CS)
Assignments (ITD)
Submissions
Sample (ÂS1,...,S6)
Overfitting
Redirection
Evasion
Injection
41 3 4 629 72 4 4
42 3 - 298 53 15 2
43 3 3 486 65 11 3
44 -- - - - - - -
45 3 3 446 55 5
46 3 - 231 54 3 3 2
47 2 3 315 55 6
48 1 - 66 44 8 2
49 3 3 363 66 6 1
50 3 - 192 47 5 7
XMas -- - - - - - -
02 3 4 280 57 2 3 1
03 2 - 38 38 1 11
Â31 20 3344 570 66 6 31 2
ures 1, 2, and 4. Additionally, the source codes of
the sample submissions were exported weekly as an
archived PDF document. However, the scanning for
cheat-patterns was done manually within these doc-
uments. Sadly, this dataset cannot be made publicly
available because it contains non-anonymous student-
related data.
It turned out, that primarily the sample S4 was a
very effective way to detect a special kind of cheat
that we called overfitting (see Section 3.1). Other
kinds of cheats seem to occur equally distributed
across all samples. However, the reader should be
aware that the search for cheat-patterns was quali-
tatively and not quantitatively. So, the study may
provide some hints but should not be taken to draw
any conclusions on quantitative cheat-pattern distri-
butions (see Section 4).
3 OBSERVED CHEAT-PATTERNS
The observed student cheat and trick patterns will be
explained by the small and continuing example al-
ready introduced in Section 2.1.
Most students strived to find a solution that fits
the scope and intent of the assignment (see Figure 4).
So, their solutions showed similarities to this refer-
ence solution - although different approaches exist to
solve this problem (see Figure 3 for a non-loop based
approach). However, a minority of students (approx-
imately 15%) make use of the fact that a “dumb au-
tomata” grades. Accordingly, we observed the fol-
lowing cheating patterns that differ significantly from
the intended reference solution above (see Figure 4):
Overfitting solutions (63%)
Redirection to reference solutions (6%)
Problem evasion (30%)
Injection (1%)
Especially overfitting and evasion tricks are “poor-
mans’ weapons” often used by novice programmers
as a last resort to solve a problem. Much more alarm-
ing redirection and injection cheats occurred only in
rare cases (less than 10%). However, how do these
tricks and cheats look like? How severe are they?
Moreover, what can be done against it? We will in-
vestigate these questions in the following paragraphs.
3.1 Overfitting Tricks
Overfitted solutions strive to get a maximum of points
for grading but do not strive to solve the given prob-
lem in a general way. A notable example of an over-
fitted solution would be Listing 4.
Listing 4: Overfitting solution.
int countChar(char c, String s) {
if (c == ’a’ && s. equa ls ("Abc"))
return 1;
if (c == ’A’ && s. equa ls ("abc"))
return 1;
if (c == ’x’ && s. equa ls ("ABC"))
return 0;
if (c == ’!’ && s. equa ls ("!!!"))
return 3;
// [...]
if (c == ’x’ && s. equa ls ("X"))
return 1;
return 42;
}
This solution maps merely the example input pa-
rameters to the expected output values. The solu-
tion is completely useless outside the scope of the test
cases.
What Seems to be Ineffective to Prevent Overfit-
ting?The problem could be solved, by merely hid-
ing the calling parameters and expected results in the
evaluation report. However, this would result in in-
transparent grading situations from a students per-
spective. A student should always know what the rea-
son is to refuse points.
Another option is to detect and penalise such kind
of overfittings. A straightforward – but not perfect
– solution would be to restrict the amount of al-
lowed return statements. That makes overfitting
more complicated for students but still possible as
Figure 4: Observed cheat-pattern frequency (sums may not add up exactly to 100% due to rounding).
Listing 5 shows. This submission reduced the amount
of return statements simply by replacing if state-
ments with more complex logic expressions. How-
ever, the return avoidance does not change the over-
fitting at all.
Listing 5: Return avoidance.
int countChar(char c, String s) {
if (c==’a’ && s.equ als ("Abc")) ||
(c==’A’ && s.equ als ("abc")) ||
(c==’x’ && s.equ als ("X"))
return 1;
// [...]
if (c==’x’ && s.equ als ("ABC"))
return 0;
}
What Can be Done to Prevent Overfitting?Ran-
domised tests make such overfitted submissions inef-
fective. Therefore, our general recommendation is to
give a substantial fraction of points for randomised
test cases. However, to provide some control over
randomised tests, these tests must be pattern based to
generate random test cases targeting expected prob-
lems (e.g., off-by-one errors, boundary cases) in stu-
dent submissions. We refer to (Romli et al., 2017) for
further details. E.g. for string-based data we gained
promising results to generate random strings merely
by applying regular expressions (Thompson, 1968)
inversely. However, the reader should be aware that
current unit testing frameworks like JUnit do not pro-
vide much support for randomised test cases.
3.2 Redirection Cheats
Another shortcoming of APAAS solutions can be
compiler error messages that reveal details of the eval-
uation logic. In the case of VPL, an evaluation is pro-
cessed according to the following steps.
1. The submission is compiled and linked to the
evaluation logic.
2. The compiled result is executed to run the checks.
3. The check results are printed in an APAAS spe-
cific notation on the console (standard-out).
4. This output is interpreted by the APAAS solution
to run the automatic grading and present a kind of
feedback to the submitter.
This process is straightforward and provides the
benefit that evaluation components can handle almost
all programming languages. If one of the steps fails,
an error message is generated and returned to the sub-
mitter as feedback. This failing involves typically to
return the compiler error message. That can be prob-
lematic because these compiler error messages may
provide unexpected cheating opportunities.
Let us remember. The assignment was to program
a method countChar(). Let us further assume that a
student makes a small spelling error like to name the
method countChars() instead of countChar() – so
just a trailing sis added. That is a general program-
ming error that happens fast (see Listing 6).
Listing 6: A slightly misspelled submission.
int countCharS(char c, String s) {
int i=0;
for (char x:s.toCharArray()){
if (x == c) i++;
}
return i;
}
If this submission would be submitted and eval-
uated by an APAAS solution, this submission would
likely not pass the first compile step due to a simple
naming error. What is returned is a list of compiler
error messages like this one here that shows the prob-
lem:
Checks.java:40: error: cannot find symbol
Submission.>>countChar<<(’a’, "Abc") ==
Solution.countChar(’a’, "Abc")
The compiler message provides useful hints to
correct a misspelt method name, but it also reveals
that a method (Solution.countChar()) exists to
check the student submission. The reference solu-
tion can be assumed to be correct. So, compiler error
messages can reveal a reference solution method that
could be called directly. A student can correct the
naming and redirects the submitted method directly
to the method that is used for grading. Doing that,
the student would let the evaluation component eval-
uate itself which will provide very likely full points.
A notable example would be Listing 7.
Listing 7: Redirection submission.
int countChar(char c, String s) {
return Solution. countChar(c, s );
}
This is categorized as a redirection cheat. Stu-
dents can gain insights systematically into the evalua-
tion logic by submitting non-compilable submissions
intentionally.
What Seems to be Ineffective to Prevent Redirec-
tion?APAAS solutions could avoid returning com-
piler error messages completely. However, this would
limit the necessary debug pieces of information for
students.
One could return blanked compiler error messages
that do not reveal sensitive information. However, to
blank out relevant parts of compiler error messages is
language specific and can be very tricky to be solved
in general. What is more, with every update or new
release of a compiler, corresponding error messages
can change. All this would have been to be checked
with each compiler update. That seems not a prag-
matic approach.
One could avoid to generate the expected results
by a reference solution and provide them as hard-
coded values in the evaluation logic. However, this
would limit opportunities to generate test cases in
a pattern-based approach, and we already identified
randomised test cases as appropriate countermeasure
to handle overfitting cheats. So, this is not the best
solution.
What are the Problems to Prevent Redirection?
The submission should be executed in a context that
by design cannot access the grading logic. In a perfect
world, the student logic should be code that deseri-
alises input parameters from stdin, passes them to the
submitted function, and serialises the output to stdout.
The grading logic should serialise parameters, pipe
them into the wrapped student code, deserialise the
stdout, and compare it with the reference function’s
output. However, this approach would deny making
use of common unit testing frameworks for evalua-
tion although it would effectively separate the sub-
mission logic and the evaluation logic in two different
processes (which would make most of the attack vec-
tors in this setting ineffective). However, to the best
of the author’s knowledge no unit testing frameworks
exist that separate the test logic from the to be tested
logic into different processes.
In case of a class-based object-oriented language,
one can overcome this problem using abstract classes.
It is to demand that the submission must be derived
from a given abstract class with abstract methods to be
overwritten by the student’s methods. The evaluation
logic calls the abstract class and the evaluation code
can be thus compiled in advance. Compilation errors
can now only refer to the submitted code, not to the
evaluation code. However, our experiences showed
that this approach tends to explode in more complex
object-oriented contexts with plenty of classes and de-
pendencies.
Furthermore, we have to consider the pedagogical
aspect here. For the abstract class-based solution, we
need the concept of an abstract class from day one of
the course. According to our experiences, especially
novice (freshman) programmers have plenty of prob-
lems understanding the concept of an abstract class at
this level.
Another approach is to scan the submission for
questionable calls like Solution.x() calls. If such
calls are found, the submission is downgraded to zero
points. That is our approach after we have identi-
fied this “attack vector. Additionally, we deny to
make use of getClass() calls and the import of the
reflection package. Both would enable to formu-
late arbitrary indirections. However, this makes it
necessary to apply parsers and makes the assignment
specific evaluation logic a bit more complicated and
time-intensive to program. We show at the end of the
following Section 3.3 how to minimise these efforts.
3.3 Problem Evasion Tricks
Another trick pattern is to evade a given problem
statement. According to our experiences, this pat-
tern occurs mainly in the context of more sophisti-
cated and formal programming techniques like recur-
sive programming or functional programming styles
with lambda functions.
So, let us now assume that the assignment is still to
implement a countChar() method, but this method
should be implemented recursively. A reference so-
lution might look like in Listing 8 (we do not consider
performance aspects due to tail recursion):
Listing 8: Recursive reference solution.
int countChar(char c, String s) {
s=s.toLowerCase();
c=Character.toLowerCase(c);
if (s.isEmpty()) return 0;
char head = s. charAt (0);
String rest = s.substring (1);
int n=head==c?1:0;
return n+countChar(c,rest);
}
However, sometimes student submissions only
pretend to be recursive without being it. Listing 9 is a
notable example.
Listing 9: Problem evasing solution.
int countChar(char c, String s) {
if (s.isEmpty()) return 0;
return countChar(c, s , 0);
}
int countChar(char c, String s,int i){
for (char x:s.toCharArray()){
if (x == c) i++;
}
return i;
}
Although countChar() is calling (an overloaded
version of) countChar() which looks recursively, the
overloaded version of countChar() makes use of a
for-loop and is therefore implemented in a fully im-
perative style.
The same pattern can be observed if an assign-
ment requests functional programming with lambda
functions. A lambda-based solution could look like
in Listing 10.
Listing 10: Lambda reference solution.
(c, s) -> Stream .of(s.toC harArray ())
.filter(x -> x == c)
.count();
However, students take refuge in familiar pro-
gramming concepts like loops. Very often, submis-
sions like in Listing 11 are observable:
Listing 11: Lambda problem evasion.
(c, s) -> {
int i=0;
for (char x:s.toCharArray()){
if (x == c) i++;
}
return i;
};
The (c, s) ->{[...] };seems functional
on the first-hand. But, if we look at the implemen-
tation, it is only an imperative for loop embedded in
a functional looking context.
The problem here is, that evaluation components
just looking on input-output parameter correctness
will not detect these kinds of programming style eva-
sions. The just recursive- or functional-looking solu-
tions will generate the correct results. Nevertheless,
the intent of such kind of assignments is not just to
foster correct solutions but although to train specific
styles of programming.
What Can be Done to Avoid Problem Evasion?
Similar to redirection cheats, submissions can be
scanned for unintended usage of language concepts
like for and while loops. In these cases, the submis-
sion gets no points or is downgraded. However, this
makes it necessary to apply parsers and makes the as-
signment specific evaluation logic a bit more compli-
cated and time-intensive to program. To simplify this
work, we are currently working on a selector model
that selects nodes from an abstract syntax tree (AST)
of a compilation unit to detect and annotate such kind
of violations in a pragmatic way. The approach works
similarly like CSS selectors selecting parts of a DOM-
tree in a web context. Listing 12 is an illustrating ex-
ample. It effectively detects and annotates every loop
outside the main()-method.
Listing 12: Selector based inspection.
inspect("Main.java",ast->
ast. select (METHOD , "[name!=main]")
.select(FOR, FOREACH, WHILE)
.annotate ("no loop")
.exists()
);
3.4 Point Injection Cheats
All previous cheat-patterns focused the compile, or
the execution step, and try to formulate a clever sub-
mission that tricks the evaluation component and its
checks. Instead of this, injection cheats target inten-
tionally the grading component. Injection cheats re-
quire in-depth knowledge about what specific APAAS
solution (e.g., VPL) is used, and knowledge about the
internals and details how the APAAS solutions gener-
ate intermediate outcomes to calculate a final grade.
We explain this “attack” by the example of VPL.
However, the attack can be easily adapted to other
APAAS tools. VPL relies on an evaluation script that
triggers the evaluation logic. The evaluation logic has
to write results directly on the console (standard-out).
The grading component parses and searches for lines
that start with specific prefixes like
Grade :=>> to give points
Comment :=>> for hints and remarks that should
be presented to the submitter as feedback.
VPL assumes that students are not aware of this
knowledge. It is furthermore (somehow inherently)
assumed that student submissions do not write to the
console (just the evaluation logic should do that) – but
it is possible for submissions to place arbitrary output
on the console and is not prohibited by the Jails server.
So, these assumptions are a fragile defence. A quick
internet search with the search terms "grade VPL"
will turn up documentation of VPL explaining how
the internals of the grading component are working
under the hood. So, submissions like Listing 13 are
possible and executable.
Listing 13: Injection submission.
int countChar(char c, String s) {
System.out .print ("Grade :=>> 100");
System.exit(0);
return 0; // for compiler silence
}
The intent of such a submission is merely to inject
a line like this
Grade :=>> 100
into the output stream to let the grading component
evaluate the submission with full points.
What Can be Done to Avoid Injection Attacks?In
a perfect world, the student code should not have ac-
cess to the streams that are sent to the evaluation com-
ponent. However, in the case of VPL, exactly this case
cannot be prevented.
Nevertheless, there are several options to handle
this problem. In our case, we developed a basic eval-
uation logic that relies on a workflow assuring that
points are always written after a method has been
tested. Additionally, we prohibited to make use of the
System.exit() call to assure that submissions could
never stop the execution on their own. So, it might
be that situations occur on the output stream with in-
jected grading statements.
Grade :=>> 100 (injected)
Grade :=>> 0 (regular)
However, the regular statements are always fol-
lowing the injected ones due to the design of the ex-
ception aware workflow. Because VPL only evaluates
the last Grade :=>> line, the injection attack has no
effect. However, this is a VPL specific solution. For
other systems the following options could be consid-
ered (if stdout access of the student submission cannot
be prevented):
The APAAS could inject a filter for System.out
to avoid or detect tainted outputs.
System.out statements could be handled like
unintended language concepts. That would
be very similar like handling problem eva-
sion cheats. However, this could deny
System.out.println() statements even for de-
bugging which could interfere with a pragmatic
workflow for students. Therefore, these checks
should be somehow limited making it necessary to
apply parsers, running complex call-chain analy-
sis. All this makes the assignment specific evalua-
tion logic much more complex and time-intensive
to program.
The APAAS could redirect the console streams
(standard out and standard error) to new streams
for the submission evaluation. That would ef-
fecitively separate the submission logic streams
from the evaluation logic streams and no stream
injections could occur.
In all cases, APAAS solutions must prevent that
student submissions can stop their execution via
System.exit() or similar calls to bypass the control
flow of the evaluation logic. In our case, we solved
this by using a Java SecurityManager – it is likely
to be more complicated for other languages not pro-
viding a virtual machine built-in security concept. For
these systems parser-based solutions (see Section 3.3)
would be a viable option.
4 THREATS OF VALIDITY
We consider and discuss the following threats of in-
ternal and external validity (Campbell and Stanley,
2003) that apply to our study and that might limit its
conclusions.
4.1 Internal Validity
Internal validity refers to whether an experimental
condition makes a difference to the outcome or not.
Selection Bias: We should consider that the target
audience of both analysed study programmes differs
regarding previous knowledge on programming. A
noteworthy fraction of CS students gained basic pro-
gramming experience before they start their computer
science studies. In case of the ITD programme, a
noteworthy fraction of students handle programming
courses as a “necessary evil. Their intrinsic motiva-
tion focuses mainly on design-related aspects. Ac-
cordingly, only a tiny fraction has some pre-study
programming experience. Because we only searched
qualitatively and not quantitatively for cheat-patterns,
this is no problem. However, it is likely that in the
CS group more cheating can be found than in the ITD
group mainly due to more mature programming skills.
So be aware, the study design is not appropriate to
draw any conclusions on which groups prefer or can
apply what kind of cheat-patterns.
Attrition: This threat is due to the different kinds of
participants drop out of the study groups. Each pro-
gramming course has a drop-off rate. The effect can
be observed in both groups by a decrease of submis-
sions over time (see Figure 2). However, this study
searched only qualitatively for cheat-patterns. For this
study, it is of minor interest whether observed cheat-
patterns occur in a phase with a high, medium, or low
drop-off rate. However, in phases with a maximum of
submissions, some cheats could have been overseen.
So, the study does not proclaim to have identified all
kinds of cheat-patterns.
Maturation: A development of participants oc-
curred in both groups (e.g. we see a decrease of the
“poor-man cheat” overfitting in Table 2). All students
developed a better understanding of the underlying
grading component and improved their programming
skills during their course of actions. Therefore, it is
likely that more sophisticated forms of cheating (like
redirection and injection) occurred with later submis-
sions. That is what Table 2 might indicate. However,
we searched only qualitatively for cheat-patterns. For
this study, it was of minor interest whether observed
cheat-patterns occur in the first, second, or third phase
of a programming course. The study design is not ap-
propriate to draw any conclusions on aspects of what
kind of cheat-pattern occur at what level of program-
ming expertise.
4.2 External Validity
External validity refers to the generalisability of the
study.
Contextual Factors: This threat occurs due to spe-
cific conditions under which research is conducted
that might limit its generalisability. In this case, we
were bound to a Moodle-based APAAS solution. The
study would not have been possible outside this tech-
nical scope. We decided to work with VPL because
it is the only mature-enough open source solution for
Moodle. Therefore, the study should not be taken to
conclude on all existing APAAS systems. However, it
seems to be worth to check existing APAAS solutions
whether they are aware of the four identified cheat-
patterns (or attack vectors from a system security per-
spective).
Hawthorne Effects: This threat occurs due to par-
ticipants’ reactions to being studied. It alters their
behaviour and therefore the study results. There-
fore, students were unaware that their submissions
were analysed to identify cheat-patterns. However,
if unaware cheat-patterns were identified subsequent
checks may have been added to the grading compo-
nent. So, it is likely that repetitive cheating hardly oc-
curred. Because of the qualitative nature of the study,
we do not see a problem here. However, because of
this internal feedback loop, the study should not be
taken to draw any conclusions on the quantitative as-
pects of cheating.
5 CONCLUSION
We have to be aware that (even first-year) students are
clever enough to apply intentionally basic cheat injec-
tion “attacks” into APAAS solutions. We identified
highly overfitted solutions,redirection to reference
solutions,problem evasion and even APAAS spe-
cific grading statement injections as cheat-patterns.
Likely, this list is not complete.
For VPL and Java programming courses a
template-based solution was used that has been inten-
tionally developed to handle such kind of cheats. We
evaluated this approach in two programming courses
with more than 3300 submissions of more than 190
first-year students. Although this template-based ap-
proach is pragmatic and working, several further
needs can be identified:
• Better overfitting prevention mechanisms al-
ready mentioned by (Romli et al., 2017).
Better mechanisms to detect the undesired usage
of programming language concepts like loops,
global variables, specific datatypes, return types,
and more.
Better mechanisms to detect, handle, and prevent
possible tainted outputs of submissions that could
potentially be used for APAAS specific grading
injections.
To handle these needs it seems necessary
1. to randomise test cases,
2. to provide additional practical code inspection
techniques based on parsers,
3. and to isolate the submission and the evaluation
logic consequently in separate processes (at least
the console output should be separated).
As far as the author oversees the APAAS land-
scape, exactly these features are only incompletely
provided. APAAS solutions are a valuable asset to
support practical programming courses to minimise
routine work for advisers and to provide immediate
feedback for students. However, as was shown, these
systems can be cheated quite easily. If we would use
them – for instance in highly hyped MOOC formats –
for automatic certification of programming expertise,
the question arises whether we would certificate the
expertise to program or to cheat and what would
this question mean for the reputation of these courses
(Alraimi et al., 2015)?
In consequence, we should look at APAAS so-
lutions much more from a security point of view –
in particular from a code injection point of view.
We identified the need to evolve unit testing frame-
works into more evaluation-oriented teaching solu-
tions. Based on the insights of this study we are
currently working on a Java-based unit testing frame-
work intentionally focusing educational contexts. Its
working state can be inspected on GitHub (https:
//github.com/nkratzke/JEdUnit).
ACKNOWLEDGEMENTS
The author expresses his gratitude to Vreda Pieterse,
Philipp Angerer, and all the anonymous reviewers for
their valuable feedback and paper improvement pro-
posals. What is more, posting a preprint of this pa-
per to the Reddit programming community had some
unique and unexpected impact. This posting gener-
ated hundreds of valuable comments, a 95% upvoting,
and resulted in more than 40k views on ResearchGate
(in a single day). To answer all this constructive and
exciting feedback was impossible. I am sorry for that
and thank this community so very much.
To teach almost 200 students programming is
much work, involves much support, and is not a one-
person show. Especially the practical courses needed
excellent supervision and mentoring. Although all of
the following persons were (intentionally) not aware
of being part of this study, they all did a tremendous
job. Therefore, I have to thank the advisers of the
practical courses David Engelhardt, Thomas Hamer,
Clemens Stauner, Volker V¨
olz, Patrick Willnow and
our student tutors Franz Bretterbauer, Francisco Car-
doso, Jannik Gramann, Till Hahn, Thorleif Harder,
Jan Steffen Krohn, Diana Meier, Jana Schwieger, Jake
Stradling, and Janos Vinz. As a team, they managed
13 groups with almost 200 students, and explained,
gave tips, and revealed plenty of their programming
tricks to a new generation of programmers.
REFERENCES
Ala-Mutka, K. M. (2005). A survey of automated assess-
ment approaches for programming assignments. Com-
puter Science Education, 15(2):83–102.
Alraimi, K. M., Zo, H., and Ciganek, A. P. (2015). Under-
standing the moocs continuance: The role of openness
and reputation. Computers & Education, 80:28 – 38.
Burrows, S., Tahaghoghi, S. M. M., and Zobel, J. (2007).
Efficient plagiarism detection for large code reposito-
ries. Softw., Pract. Exper., 37:151–175.
Caiza, J. C. and Alamo Ramiro, J. M. d. (2013). Automatic
Grading: Review of Tools and Implementations. In
Proc. of 7th Int. Technology, Education and Develop-
ment Conference (INTED2013).
Campbell, D. T. and Stanley, J. C. (2003). Experimental and
Quasi-experimental Designs for Research. Houghton
Mifflin Company. reprint.
del Pino, J. C. R., Rubio-Royo, E., and Hern´
andez-
Figueroa, Z. J. A Virtual Programming Lab for Moo-
dle with automatic assessment and anti-plagiarism
features. In Proc. of the 2012 Int. Conf. on e-Learning,
e-Business, Enterprise Information Systems, and e-
Government.
Douce, C., Livingstone, D., and Orwell, J. (2005). Auto-
matic test-based assessment of programming: A re-
view. J. Educ. Resour. Comput., 5(3).
Gupta, S. and Gupta, B. B. (2017). Cross-site scripting
(xss) attacks and defense mechanisms: classification
and state-of-the-art. International Journal of System
Assurance Engineering and Management, 8(1):512–
530.
Halfond, W. G. J. and Orso, A. (2005). Amnesia: Analysis
and monitoring for neutralizing sql-injection attacks.
In Proceedings of the 20th IEEE/ACM International
Conference on Automated Software Engineering, ASE
’05, pages 174–183, New York, NY, USA. ACM.
Hunter, J. D. (2007). Matplotlib: A 2d graphics environ-
ment. Computing In Science & Engineering, 9(3):90–
95.
Ihantola, P., Ahoniemi, T., Karavirta, V., and Sepp¨
al¨
a, O.
(2010). Review of recent systems for automatic as-
sessment of programming assignments. In Proceed-
ings of the 10th Koli Calling International Conference
on Computing Education Research, Koli Calling ’10,
pages 86–93, New York, NY, USA. ACM.
Kluyver, T., Ragan-Kelley, B., P´
erez, F., Granger, B., Bus-
sonnier, M., Frederic, J., Kelley, K., Hamrick, J.,
Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S.,
and Willing, C. (2016). Jupyter Notebooks – a pub-
lishing format for reproducible computational work-
flows. In Loizides, F. and Schmidt, B., editors, Posi-
tioning and Power in Academic Publishing: Players,
Agents and Agendas, pages 87 – 90. IOS Press.
Liu, C., Chen, C., Han, J., and Yu, P. S. (2006). Gplag: de-
tection of software plagiarism by program dependence
graph analysis. In KDD.
Maguire, P., Maguire, R., and Kelly, R. (2017). Us-
ing automatic machine assessment to teach computer
programming. Computer Science Education, 27(3-
4):197–214.
Oliphant, T. (2006). A Guide to NumPy. Trelgol Publishing.
Pomerol, J.-C., Epelboin, Y., and Thoury, C. (2015).
What is a MOOC?, chapter 1, pages 1–17. Wiley-
Blackwell.
Ray, D. and Ligatti, J. (2012). Defining code-injection at-
tacks. SIGPLAN Not., 47(1):179–190.
Rodr´
ıguez, J., Rubio-Royo, E., and Hern´
andez, Z. (2011a).
Fighting plagiarism: Metrics and methods to measure
and find similarities among source code of computer
programs in vpl. In EDULEARN11 Proceedings, 3rd
Int. Conf. on Education and New Learning Technolo-
gies, pages 4339–4346. IATED.
Rodr´
ıguez, J., Rubio Royo, E., and Hern´
andez, Z. (2011b).
Scalable architecture for secure execution and test of
students’ assignments in a virtual programming lab.
In EDULEARN11 Proceedings, 3rd Int. Conf. on Ed-
ucation and New Learning Technologies, pages 4315–
4322. IATED.
Romli, R., Mahzan, N., Mahmod, M., and Omar, M. (2017).
Test data generation approaches for structural test-
ing and automatic programming assessment: A sys-
tematic literature review. Advanced Science Letters,
23(5):3984–3989.
Smith, N., van Bruggen, D., and Tomassetti, F. (2018).
JavaParser: visited. Leanpub.
Staubitz, T., Klement, H., Renz, J., Teusner, R., and Meinel,
C. (2015). Towards practical programming exercises
and automated assessment in massive open online
courses. In 2015 IEEE International Conference on
Teaching, Assessment, and Learning for Engineering
(TALE), pages 23–30.
Su, Z. and Wassermann, G. (2006). The essence of com-
mand injection attacks in web applications. SIGPLAN
Not., 41(1):372–382.
Thi´
ebaut, D. (2015). Automatic evaluation of computer pro-
grams using moodle’s virtual programming lab (vpl)
plug-in. J. Comput. Sci. Coll., 30(6):145–151.
Thompson, K. (1968). Programming techniques: Regular
expression search algorithm. Communications of the
ACM, 11(6):419–422.
... Previous research [11] showed how astonishingly simple it is for students to trick automated programming assignment assessment systems. It is often overlooked that APAAS solutions are systems that execute injected code (student submissions) and code injection is known as a severe threat from a security point of view [20]. ...
... Of course, such code injection vulnerabilities are considered by current solutions. However, in previous research [11], it was astonishing to see that current APAAS solutions sometimes overlook the cheating cleverness of students. On the one hand, APAAS solutions protect the host system via sandbox mechanisms, and APAAS systems put much effort in sophisticated plagiarism detection and authorship control of student submissions [16], [13]. ...
... On the other, the grading component can be cheated in various -sometimes ridiculously simple -ways making these solutions highly suspect for (semi-)automated and unattended programming examinations that contribute to certificate a certain level of programming expertise. Previous research [11] identified at least four simple cheat patterns: ...
Preprint
Full-text available
According to our data, about 15% of programming students trick if they are aware that only a "dumb" robot evaluates their programming assignments unattended by programming experts. Especially in large-scale formats like MOOCs, this might become a question because to trick current automated assignment assessment systems (APAAS) is astonishingly easy and the question arises whether unattended grading components grade the capability to program or to trick. This study analyzed what kind of tricks students apply beyond the well-known "copy-paste" code plagiarism to derive possible mitigation options. Therefore, this study analyzed student cheat patterns that occurred in two programming courses and developed a unit testing framework JEdUnit as a solution proposal that intentionally targets such tricky educational aspects of programming. The validation phase validated JEdUnit in another programming course. This study identified and analyzed four recurring cheat patterns (overfitting, evasion, redirection, and injection) that hardly occur in "normal" software development and are not aware to normal unit testing frameworks that are frequently used to test the correctness of student submissions. Therefore, the concept of well-known unit testing frameworks was extended by adding three "countermeasures": randomization, code inspection, separation. The validation showed that JEdUnit detected these patterns and in consequence, reduced cheating entirely to zero. From a students perspective, JEdUnit makes the grading component more intelligent, and cheating does not pay-off anymore. This Chapter explains the cheat patterns and what features of JEdUnit mitigate them by a continuous example.
... Ein weiteres, aus der Programmierausbildung bekanntes Problem ist das Umgehen der automatisierten Prüfung, indem die Lösung konstruiert wird [36]. Übertragen auf Abfragesprachen würden dann die erwarteten Tupel über ihren Schlüssel direkt selektiert. ...
Article
Full-text available
Zusammenfassung In diesem Artikel berichten wir über den erstmaligen Einsatz der Software Praktomat im Rahmen einer einführenden Datenbankvorlesung an der Universität Passau. Unser Ziel war es, den ca. 300 Studierenden aus Informatik und Wirtschaftsinformatik während des „Corona-Semesters“ die Möglichkeit zu bieten, Anfragen in relationaler Algebra und SQL im Selbststudium zu üben und dabei automatisierte Rückmeldungen zu bekommen. Insbesondere wollten wir kein neues E‑Learning-System von Grund auf implementieren, sondern auf einem bestehenden, quelloffenen Werkzeug aufsetzen. Wir beschreiben in diesem Artikel die notwendigen Anpassungen der Praktomat -Software, ursprünglich für die grundständige Programmierausbildung konzipiert. Wir evaluieren den Zufriedenheitserfolg unserer Studierenden und berichten weiter anekdotisch über die Erfahrungen der Studierenden und Lehrenden.
... Maven executes in its own JVM, properly sandboxed by a custom Security Manager that acts at the Java API level (e.g., checking unauthorized file access or system calls). This sandbox has the advantage of isolating student's code from the teacher's code, preventing security problems that could arise if the former could invoke the latter [21]. ...
Article
Full-text available
Automated assessment tools (AATs) are software systems used in teaching environments to automate the evaluation of computer programs implemented by students. These tools can be used to stimulate the interest of computer science students in programming courses by providing quick feedback on their work and highlighting their mistakes. Despite the abundance of such tools, most of them are developed for a specific course and are not production-ready. Others lack advanced features that are required for certain pedagogical goals (e.g. Git integration) and/or are not flexible enough to be used with students having different computer literacy levels, such as first year and second year students. In this paper we present Drop Project (DP), an automated assessment tool built on top of the Maven build automation software. We have been using DP in our teaching activity since 2018, having received more than fifty thousand submissions between projects, classroom exercises, tests and homework assignments. The tool’s automated feedback has allowed us to raise the difficulty level of the course’s projects, while the grading process has become more efficient and consistent between different teachers. DP is an extensively tested, production-ready tool. The software’s code and documentation are available in GitHub under an open-source software license.
... In wenigen Ausnahmefällen zeigte sich auch studentischer Ehrgeiz, das System mit den bereits oben beschriebenen Ansätzen auszutricksen. Dieses Beobachtung deckt sich mit den Erfahrungen anderer E-Assessment-Systeme zu automatischen Auswertung von Programm-Code [10]. ...
Article
Full-text available
Zusammenfassung Die Bearbeitung von Übungsaufgaben ist ein wichtiges Element in der Datenbank-Lehre. Lösungen der Studierenden lassen sich dabei häufig in strukturierten Ergebnisformaten festhalten, wie z. B. SQL-Anfragen oder die Spezifikation von Schemata und Relationen. Dieser Beitrag stellt das E‑Assessment-Tool DMT (Data Management Tester) vor, das sowohl eine automatische Bewertung als auch eine automatische Feedback-Generierung solcher strukturierter Lösungen ermöglicht. Es soll dabei insbesondere Studierende unterstützen, nicht ganz korrekte Lösungen zielgerichtet überarbeiten zu können. Dieser Betrag skizziert Konzept und Architektur von DMT und erläutert den Einsatz an der HTWK Leipzig und der Hochschule Mittweida.
Article
Full-text available
Cloud-native software systems often have a much more decentralized structure and many independently deployable and (horizontally) scalable components, making it more complicated to create a shared and consolidated picture of the overall decentralized system state. Today, observability is often understood as a triad of collecting and processing metrics, distributed tracing data, and logging. The result is often a complex observability system composed of three stovepipes whose data are difficult to correlate. Objective: This study analyzes whether these three historically emerged observability stovepipes of logs, metrics and distributed traces could be handled in a more integrated way and with a more straightforward instrumentation approach. Method: This study applied an action research methodology used mainly in industry-academia collaboration and common in software engineering. The research design utilized iterative action research cycles, including one long-term use case. Results: This study presents a unified logging library for Python and a unified logging architecture that uses the structured logging approach. The evaluation shows that several thousand events per minute are easily processable. Conclusions: The results indicate that a unification of the current observability triad is possible without the necessity to develop utterly new toolchains.
Conference Paper
Full-text available
In recent years, Massive Open Online Courses (MOOCs) have become a phenomenon presenting the prospect of free high class education to everybody. They bear a tremendous potential for teaching programming to a large and diverse audience. The typical MOOC components, such as video lectures, reading material, and easily assessable quizzes, however, are not sufficient for proper programming education. To learn programming, participants need an option to work on practical programming exercises and to solve actual programming tasks. It is crucial that the participants receive proper feedback on their work in a timely manner. Without a tool for automated assessment of programming assignments, the teaching teams would be restricted to offer optional ungraded exercises only. The paper at hand sketches scenarios how practical programming exercises could be provided and examines the landscape of potentially helpful tools in this context. Automated assessment has a long record in the history of computer science education. We give an overview of existing tools in this field and also explore the question what can and/or should be assessed. https://hpi.de/nc/meinel/publikationen/conference-papers/web-university/year/2015/102939/2015_Staubitz_TALE.html
Article
Full-text available
Nowadays, web applications are becoming one of the standard platforms for representing data and service releases over the World Wide Web. Since web applications are progressively more utilized for security-critical services, therefore they have turned out to be a well-liked and precious target for the web-related vulnerabilities. Even though several defensive mechanisms have been building up to reinforce the modern web applications and alleviate the attacks instigated against them. We have analyzed the major concerns for web applications and Internet-based services which are persistent in several web applications of diverse organizations like banking, health care, financial service, retail and so on by the referring the Website Security Statistics Report of White Hat Security. In this paper, we highlight some of the serious vulnerabilities found in the modern web applications and revealed various serious vulnerabilities. Cross-Site Scripting (XSS) attack is the top most vulnerability found in the today’s web applications which to be a plague for the modern web applications. XSS attacks permit an attacker to execute the malicious scripts on the victim’s web browser resulting in various side-effects such as data compromise, stealing of cookies, passwords, credit card numbers etc. We have also discussed a high level of taxonomy of XSS attacks and detailed incidences of these attacks on web applications. A detailed comprehensive analysis of the exploitation, detection and prevention mechanisms of XSS attacks has also been discussed. Based on explored strength and flaws of these mechanisms, we have discussed some further work.
Article
We report on an intervention in which informal programming labs were switched to a weekly machine-evaluated test for a second year Data Structures and Algorithms module. Using the online HackerRank system, we investigated whether greater constructive alignment between course content and the exam would result in lower failure rates. After controlling for known associates, a hierarchical regression model revealed that HackerRank performance was the best predictor of exam performance, accounting for 18% of the variance in scores. Extent of practice and confidence in programming ability emerged as additional significant predictors. Although students expressed negativity towards the automated system, the overall failure rate was halved, and the number of students gaining first class honours tripled. We infer that automatic machine assessment better prepares students for situations where they have to write code by themselves by eliminating reliance on external sources of help and motivating the development of self-sufficiency.
Automatic Programming Assessment (or APA) has commonly been applied in computer science education in assisting educators to effectively mark and grade students’ programming assignments, and hence significantly reducing the workload of educators. APA is one of the methods that require test data generation as its integral part. This study attempts to reveal gaps between software testing and computer science education researches in terms of test data generation for structural testing. This study utilizes a technique of Systematic Literature Review (SLR) to provide a comprehensive review of the existing evidence within the topic of interest. This SLR aims to identify possible approaches to improving the current study of APA, particularly in generating test data for structural testing. The results of this study revealed that there is a clear gap within the perspective context of the covered areas. Particularly, in software testing, the findings show that Meta-Heuristic techniques are among the popular techniques that were applied to realize Automated Test Data Generation (ATDG) for structural testing. Apart from that, the path coverage, branch coverage and statement coverage have been identified as the most popular structural logic coverage that were applied in previous studies.
Conference Paper
It is increasingly necessary for researchers in all fields to write computer code, and in order to reproduce research results, it is important that this code is published. We present Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable. We discuss various tools and use cases for notebook documents.
Chapter
Whilst Massive Open Online Courses (MOOCs) undeniably represent a change of scale, they are nevertheless part of the age-old concept of distance learning. MOOCs are defined by their audience and their format. These courses are open to one and all, and have no physical limitations because they are completely digitized and accessible over the Internet with no barriers. The MOOCs are distinguished by two main types: "xMOOC" or "transmissive MOOC" that guides the students like a teacher along a clearly-defined path; and connectivist MOOC in which learners themselves construct their own course. Small private online courses (SPOCs) are designed for an audience of students enrolled on a particular course. Small Open Online Courses (SOOCs) are the one in which all students wishing to take the course are obliged to take a test beforehand to determine their level.
Conference Paper
Automatic grading of programming assignments is an important topic in academic research. It aims at improving the level of feedback given to students and optimizing the professor’s time. Its importance is more remarkable as the amount and complexity of assignments increases. Several studies have reported the development of software tools to support this process. They usually consider particular deployment scenarios and specific requirements of the interested institution. However, the quantity and diversity of these tools makes it difficult to get a quick and accurate idea of their features. This paper reviews an ample set of tools for automatic grading of programming assignments. The review includes a description of every tool selected and their key features. Among others, the key features analyzed include the programming language used to build the tool, the programming languages supported for grading, the criteria applied in the evaluation process, the work mode (as a plugin, as an independent tool, etc.), the logical and deployment architectures, and the communications technology used. Then, implementations and operation results are described with quantitative and qualitative indicators to understand how successful the tools were. Quantitative indicators include number of courses, students, tasks, submissions considered for tests, and acceptance percentage after tests. Qualitative indicators include motivation, support, and skills improvement. A comparative analysis among the tools is shown, and as result a set of common gaps detected is provided. The lack of normalized evaluation criteria for assignments is identified as a key gap in the reviewed tools. Thus, an evaluation metrics frame to grade programming assignments is proposed. The results indicate that many of the analyzed features highly depend on the current technology infrastructure that supports the teaching process. Therefore, they are a limiting factor in reusing the tools in new implementation cases. Another fact is the inability to support new programming languages, which is limited by tools’ updates. On metrics for evaluation process, the set of analyzed tools showed much diversity and inflexibility. Knowing which implementation features are always specific and particular independently of the project, and which others could be common will be helpful before the implementation and operation of a tool. Considering how much flexibility could be attained in the evaluation process will be helpful to design a new tool, which will be used not only in particular cases, and to define the automation level of the evaluation process.
Article
A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular expression as source language and produces an IBM 7094 program as object language. The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression. Examples, problems, and solutions are also presented.