Content uploaded by Nane Kratzke
Author content
All content in this area was uploaded by Nane Kratzke on Jul 23, 2020
Content may be subject to copyright.
How programming students trick and what
JEdUnit can do against it
Nane Kratzke[0000−0001−5130−4969]
L¨ubeck University of Applied Sciences, M¨onkhofer Weg 239, 23562 L¨ubeck, Germany
nane.kratzke@th-luebeck.de
Abstract. According to our data, about 15% of programming students
trick if they are aware that only a “dumb” robot evaluates their pro-
gramming assignments unattended by programming experts. Especially
in large-scale formats like MOOCs, this might become a question because
to trick current automated assignment assessment systems (APAAS) is
astonishingly easy and the question arises whether unattended grading
components grade the capability to program or to trick. This study an-
alyzed what kind of tricks students apply beyond the well-known “copy-
paste” code plagiarism to derive possible mitigation options. Therefore,
this study analyzed student cheat patterns that occurred in two program-
ming courses and developed a unit testing framework JEdUnit as a solu-
tion proposal that intentionally targets such tricky educational aspects
of programming. The validation phase validated JEdUnit in another pro-
gramming course. This study identified and analyzed four recurring cheat
patterns (overfitting, evasion, redirection, and injection) that hardly oc-
cur in “normal” software development and are not aware to normal unit
testing frameworks that are frequently used to test the correctness of
student submissions. Therefore, the concept of well-known unit testing
frameworks was extended by adding three “countermeasures”: random-
ization, code inspection, separation. The validation showed that JEdUnit
detected these patterns and in consequence, reduced cheating entirely to
zero. From a students perspective, JEdUnit makes the grading compo-
nent more intelligent, and cheating does not pay-off anymore. This Chap-
ter explains the cheat patterns and what features of JEdUnit mitigate
them by a continuous example.
Keywords: automatic ·assessment ·programming ·course ·APAAS ·
MOOC ·trick ·cheat ·pattern ·mitigation
1 Introduction
In a digitized world, more and more experts are needed with at least some ba-
sic programming skills. Programming might even evolve into a foundational skill
similar to reading, writing, and calculating. Therefore, the course sizes of univer-
sity and college programming courses are steadily increasing. Even massive open
online courses [14] are used more and more systematically to convey necessary
2 Nane Kratzke
programming capabilities to students of different disciplines [19]. The course-
work consists of programming assignments that need to be assessed. Since the
submitted assignments are executable programs with a formal structure, they
are highly suited to be assessed automatically. In consequence, plenty of auto-
mated programming assignment assessment systems (APAAS) evolved. We refer
to [17], [3], [9], [5], and [1] for an overview of such tools.
Previous research [11] showed how astonishingly simple it is for students to
trick automated programming assignment assessment systems. It is often over-
looked that APAAS solutions are systems that execute injected code (student
submissions) and code injection is known as a severe threat from a security point
of view [20]. We refer to [7], [15], and [6] for an overview of such kind of attacks.
Of course, such code injection vulnerabilities are considered by current solu-
tions. However, in previous research [11], it was astonishing to see that current
APAAS solutions sometimes overlook the cheating cleverness of students. On
the one hand, APAAS solutions protect the host system via sandbox mecha-
nisms, and APAAS systems put much effort in sophisticated plagiarism detec-
tion and authorship control of student submissions [16], [13]. On the other, the
grading component can be cheated in various – sometimes ridiculously simple
– ways making these solutions highly suspect for (semi-)automated and unat-
tended programming examinations that contribute to certificate a certain level
of programming expertise. Previous research [11] identified at least four simple
cheat patterns:
–Overfitting
–Evasion
–Redirection
–Injection
Moreover, it strived to raise a general problem awareness but did not focus
on solutions to mitigate these patterns. To propose solutions on how to mitigate
these identified cheat patterns is the primary intent and contribution of this
Chapter. We propose to use the following three techniques:
–Randomization of test cases
–Pragmatic code inspections
–Separation of student and evaluation logic
These techniques mitigate the presented patterns and demonstrate the suit-
ability for the APAAS solution Moodle/VPL and the programming language
Java. Nevertheless, the principles are transferable to other APAAS solutions
and programming languages and therefore of broader interest and not limited to
Moodle/VPL and Java.
Consequently, the remainder of this paper is outlined as follows. Section 2
presents the methodology that has been used to identify and categorize student
cheat-patterns and to validate appropriate “countermeasures”. Section 3 will ex-
plain the identified cheat-patterns. Section 4 presents a mitigation analysis and
How programming students trick and what JEdUnit can do against it 3
shows that the identified cheat patterns can be addressed by three “countermea-
sures” that should be considered by every APAAS solution. Section 5 explains
how these insights have been considered in the development of a unit testing
framework (JEdUnit) that focuses intentionally on educational aspects and con-
siders the aspects mentioned above consequently. We will discuss our results of
JEdUnit in Section 6 and provide some guidelines on the generalizability and
limitations of this research. Finally, we conclude our findings in Section 7.
Fig. 1: Research methodology
2 Methodology
Figure 1 presents the overall research approach that compromised two phases:
Problem systematization and solution proposal validation.
2.1 Problem systematization
For the initial problem systematization, two first semester programming Java
courses in the winter semester 2018/19 (see Table 1) have been systematically
evaluated. Both courses were used to search for student submissions that inten-
tionally trick the grading component of APAAS solutions. Table 1 provides a
summarized overview of the course design. All assignments were automatically
4 Nane Kratzke
Table 1: Courses used for problem systematization and solution validation
Systematization Validation
Prog I (CS) Prog I (ITD) Prog II (CS)
Students 113 79 73
Assignments (total) 29 20 8
Number of bunches 11 6 7
Assignments per bunch (avg) 3 3 1
Time for a bunch (weeks) 1 2 2
Groups 7 6 4
Students per group (avg) 18 12 18
Student/advisor ratio (avg) 6 6 6
evaluated by the VPL Moodle plugin (version 3.3.3) following the general rec-
ommendations described by [21]. For more details, the reader is referred to [11].
Figure 2 shows an exemplifying VPL screenshot from a students perspective.
Fig. 2: VPL screenshot (on the right evaluation report presented to students)
[11]
To minimize Hawthorne and Experimenter effects [4] neither the students
nor the advisers in the practical programming courses were aware that student
submissions were analyzed to deduce cheating patterns. Even if cheating was de-
tected, this had no consequences for the students. It was not even communicated
to the student or the advisers.
Furthermore, students were not aware that the version history of their sub-
missions and therefore even intermediate cheating experiments (that did not
make it to the final submission) were logged. However, not every submission
How programming students trick and what JEdUnit can do against it 5
was inspected for understandable effort reasons. Therefore, only significant sub-
mission samples (see Table 2) were investigated to search systematically for
cheat-patterns.
Table 3 summarizes the results quantitatively. Within these six samples,
cheat and trick patterns were identified mainly by a manual but a script-supported
observation. VPL submissions were downloaded from Moodle and analyzed weekly.
We developed a Jupyter -based [10] quantitative analysis and submission data
model for this dataset. Each student submission was represented as an object
containing its version and grading history that references its student submitter
and its corresponding study programme. The analytical script and data model
made use of the well known Python libraries statistics,NumPy [12], matplotlib
[8], and the Javaparser library [18]. It was used to identify the number of submis-
sions and evaluations, points per submission versions, timestamps per submission
version, occurrences of unusual terms, and more. Based on this quantitative data,
the mentioned samples (S1 - S5) were selected automatically (or randomly in case
of S6). Additionally, the source codes of the sample submissions were exported
weekly as an archived PDF document. However, the scanning for cheat-patterns
was done manually within these documents.
Table 2: Weekly analyzed samples (systematization phase)
Description Rationale
S1 TOP 10 of submissions with many triggered evalua-
tions.
Parameter optimization could
cause plenty of evaluations.
S2 TOP 10 of submission with many versions. Cheating experiments could
cause plenty of versions.
S3 TOP 10 of submissions with astonishingly less average
points across all triggered evaluations but full points
in the final submission.
Cheating could cause such point
boosts.
S4 Submissions with unusual (above 95% percentile)
many condition related terms like if,return,switch,
case, and so on.
Parameter optimization could
cause unuasal many condition
related terms.
S5 Submissions with unusual terms like System.exit,
System.getProperties,:=>> that would stop pro-
gram execution or have a special meaning in VPL or
Java but are unlikely to contribute to a problem solu-
tion.
APAAS attacks could cause
such unusual terms.
S6 Ten random submissions. To cover unintended observa-
tion aspects.
6 Nane Kratzke
2.2 Solution proposal validation
Based on these elicited cheat patterns, corresponding mitigation options have
been derived. Three of them (randomization, code inspection, and submission/e-
valuation logic separation) have been implemented in a unit testing framework
called JEdUnit as a solution proposal to mitigate the identified problems. JE-
dUnit has been validated using a Programming II course for computer science
students in the summer semester 2019. The course has been given analogously
to the systematization phase except that JEdUnit has been applied. The search
for cheats have been conducted similarly, except the fact, that we inspected ev-
ery submission because of smaller course size and less (but more extensive and
more complex) assignments. The mitigation options and JEdUnit, as well as the
validation results, are presented in the following Sections.
3 Cheat-patterns to consider
Some basic Java programming knowledge must be assumed throughout this pa-
per. The continuous example assignment for this Chapter shall be the following
task. A method countChar() has to be programmed that counts the occurrence
of a specific character cin a given String s(not case-sensitive).
The following example calls are provided for a better understanding of the
intended functionality.
–countChar('a', "Abc") →1
–countChar('A', "abc") →1
–countChar('x', "ABC") →0
–countChar('!', "!!!") →3
A reference solution for our “count chars in a string” problem might be the
following implementation of countChar().
Listing 1.1: Reference solution (continuous example)
int c ou n tC h ar ( char c , S tr i ng s ) {
s = s. t o L o werCase ();
c = C h ar ac t er . t o Lo w er Ca s e (c ) ;
in t i = 0 ;
fo r (char x : s . t o C h a r A r r a y ( ) ) {
if ( x == c ) i + +;
}
return i;
}
According to our data, most students strive to find a solution that fits the
scope and intent of the assignment (see Table 3 and Figure 3). However, in the
systematization phase, a minority of students (approximately 15%) make use of
the fact that a “dumb automat” grades. Accordingly, we observed the following
cheating patterns that differ significantly from the intended reference solution
above (see Figure 3):
How programming students trick and what JEdUnit can do against it 7
Table 3: Detected cheats
Systematization phase Validation phase
winter semester 2018 summer semester 2019
Week
Assignments (CS)
Assignments (ITD)
Submissions
Sample (PS1, ..., S6)
Overfitting
Redirection
Evasion
Injection
41 3 4 629 72 4 4
42 3 - 298 53 15 2
43 3 3 486 65 11 3
44 - - - - - - - -
45 3 3 446 55 5
46 3 - 231 54 3 3 2
47 2 3 315 55 6
48 1 - 66 44 8 2
49 3 3 363 66 6 1
50 3 - 192 47 5 7
XMas - - - - - - - -
02 3 4 280 57 2 3 1
03 2 - 38 38 1 11
P31 20 3344 570 66 6 31 2
Week
Submissions
Overfitting
Redirection
Evasion
Injection
13 68 5 4
15 62 3 1 1
17 54 1
19 51
21 45
23 34
24 33
P347 8 0 6 1
–Overfitting solutions (63%)
–Redirection to reference solutions (6%)
–Problem evasion (30%)
–Injection (1%)
Especially overfitting and evasion tricks are “poor-mans’ weapons” often used
by novice programmers as a last resort to solve a problem. Much more alarm-
ing redirection and injection cheats occurred only in rare cases (less than 10%).
However, how do these tricks and cheats look like? How severe are they? More-
over, what can be done against it? We will investigate these questions in the
following paragraphs.
3.1 Overfitting tricks
Overfitted solutions strive to get a maximum of points for grading but do not
strive to solve the given problem in a general way. A notable example of an
overfitted solution would be Listing 1.2.
8 Nane Kratzke
Fig. 3: Observed cheat-pattern frequency (without application of JEdUnit) [11]
Listing 1.2: Overfitting solution
int c ou n tC h ar ( char c , S tr i ng s ) {
if ( c = = ’ a ’ && s . eq ua l s (" A b c " )) return 1 ;
if ( c = = ’ A ’ && s . eq ua l s (" a b c " )) return 1 ;
if ( c = = ’ x ’ && s . eq ua l s (" A B C " )) return 0 ;
if ( c = = ’ ! ’ && s . eq ua l s (" ! ! ! " )) return 3 ;
// [ . . . ]
if ( c = = ’ x ’ && s . eq ua l s ("X")) return 1 ;
return 42 ;
}
This solution maps merely the example input parameters to the expected
output values. The solution is completely useless outside the scope of the test
cases.
3.2 Problem evasion tricks
Another trick pattern is to evade a given problem statement. According to our
experiences, this pattern occurs mainly in the context of more sophisticated
and formal programming techniques like recursive programming or functional
programming styles with lambda functions.
So, let us now assume that the assignment is still to implement a countChar()
method, but this method should be implemented recursively. A reference solu-
tion might look like in Listing 1.3 (we do not consider performance aspects due
to tail recursion):
Listing 1.3: Recursive reference solution
int c ou n tC h ar ( char c , S tr i ng s ) {
How programming students trick and what JEdUnit can do against it 9
s = s. t o L o werCase ();
c = C h ar ac t er . t o Lo w er Ca s e (c ) ;
if ( s . i sE mp t y () ) return 0;
char h ea d = s . ch a rA t ( 0) ;
S tr in g r es t = s . su b st r in g ( 1) ;
in t n = head = = c ? 1 : 0 ;
return n + c o u nt C h ar ( c , re s t );
}
However, sometimes, student submissions only pretend to be recursive with-
out being it. Listing 1.4 is a notable example.
Listing 1.4: Problem evasing solution
int c ou n tC h ar ( char c , S tr i ng s ) {
if ( s . i sE mp t y () ) return 0;
return c ou n t Ch a r ( c , s , 0 );
}
int c ou n tC h ar ( char c , S tr i ng s , in t i ) {
fo r (char x : s . t o C h a r A r r a y ( ) ) {
if ( x == c ) i + +;
}
return i;
}
Although countChar() is calling (an overloaded version of) countChar() which
looks recursively, the overloaded version of countChar() makes use of a for-loop
and is therefore implemented in a fully imperative style.
The same pattern can be observed if an assignment requests functional pro-
gramming with lambda functions. A lambda-based solution could look like in
Listing 1.5.
Listing 1.5: Lambda reference solution
( c , s ) -> S tr e a m . of ( s . t o C ha r A r ra y ( ))
. f il te r ( x -> x = = c )
. c ou nt ( );
However, students take refuge in familiar programming concepts like loops. Very
often, submissions like in Listing 1.6 are observable:
Listing 1.6: Lambda problem evasion
( c , s ) -> {
in t i = 0 ;
fo r (char x : s . t o C h a r A r r a y ( ) ) {
if ( x == c ) i + +;
}
return i;
};
10 Nane Kratzke
The (c, s) ->{[...] };seems functional on the first-hand. But, if we look
at the implementation, it is only an imperative for loop embedded in a functional
looking context.
The problem here is, that evaluation components just looking on input-
output parameter correctness will not detect these kinds of programming style
evasions. The just recursive- or functional-looking solutions will generate the
correct results. Nevertheless, the intent of such kind of assignments is not just
to foster correct solutions but although to train specific styles of programming.
3.3 Redirection cheats
Another shortcoming of APAAS solutions can be compiler error messages that
can reveal details of the evaluation logic. In the case of VPL, an evaluation is
processed according to the following steps.
1. The submission is compiled and linked to the evaluation logic.
2. The compiled result is executed to run the checks.
3. The check results are printed in an APAAS specific notation on the console
(standard-out).
4. This output is interpreted by the APAAS solution to run the automatic
grading and present a kind of feedback to the submitter.
This process is straightforward and provides the benefit that evaluation com-
ponents can handle almost all programming languages. If one of the steps fails,
an error message is generated and returned to the submitter as feedback. This
failing involves typically to return the compiler error message. That can be prob-
lematic because these compiler error messages may provide unexpected cheating
opportunities.
Let us remember. The assignment was to program a method countChar().
Let us further assume that a student makes a small spelling error like to name the
method countChars() instead of countChar() – so just a trailing sis added.
That is a general programming error that happens fast (see Listing 1.7).
Listing 1.7: A slightly misspelled submission
int c ou n tC h ar S ( char c , S tr i n g s) {
in t i = 0 ;
fo r (char x : s . t o C h a r A r r a y ( ) ) {
if ( x == c ) i + +;
}
return i;
}
If this submission would be submitted and evaluated by an APAAS solution,
this submission would likely not pass the first compile step due to a simple
spelling error. What is returned is a list of compiler error messages like this one
here that shows the problem:
How programming students trick and what JEdUnit can do against it 11
Checks.java:40: error: cannot find symbol
Submission.>>countChar<<(’a’, "Abc") ==
Solution.countChar(’a’, "Abc")
The compiler message provides useful hints to correct a misspelt method
name, but it also reveals that a method (Solution.countChar()) exists to check
the student submission. The reference solution can be assumed to be correct.
So, compiler error messages can reveal a reference solution method that could
be called directly. A student can correct the naming and redirects the submitted
method directly to the method that is used for grading. Doing that, the student
would let the evaluation component evaluate itself, which will provide very likely
full points. A notable example would be Listing 1.8.
Listing 1.8: Redirection submission
int c ou n tC h ar ( char c , S tr i ng s ) {
return S ol ut i on . c o un t Ch a r (c , s ) ;
}
This is categorized as a redirection cheat. Students can gain insights sys-
tematically into the evaluation logic by submitting non-compilable submissions
intentionally.
3.4 Point injection cheats
All previous cheat-patterns focused the compile, or the execution step, and try
to formulate a smart submission that tricks the evaluation component and its
checks. Instead of this, injection cheats target intentionally the grading compo-
nent. Injection cheats require in-depth knowledge about what specific APAAS
solution (e.g., VPL) is used, and knowledge about the internals and details how
the APAAS solutions generate intermediate outcomes to calculate a final grade.
We explain this “attack” by the example of VPL. However, the attack can
be easily adapted to other APAAS tools. VPL relies on an evaluation script that
triggers the evaluation logic. The evaluation logic has to write results directly
on the console (standard-out). The grading component parses and searches for
lines that start with specific prefixes like
–Grade :=>> to give points
–Comment :=>> for hints and remarks that should be presented to the sub-
mitter as feedback.
VPL assumes that students are not aware of this knowledge. It is furthermore
(somehow inherently) assumed that student submissions do not write to the
console (just the evaluation logic should do that) – but it is possible for submis-
sions to place arbitrary output on the console and is not prohibited by the Jails
server. So, these assumptions are a fragile defence. A quick internet search with
the search terms "grade VPL" will turn up the documentation of VPL explain-
ing how the grading component is working under the hood. So, submissions like
Listing 1.9 are possible and executable.
12 Nane Kratzke
Listing 1.9: Injection submission
int c ou n tC h ar ( char c , S tr i ng s ) {
S ys te m . o ut . p ri n t (" G ra d e := > > 10 0 " ) ;
S ys t em . e xi t ( 0 );
return 0; / / f or c o m p i l e r s i l e n c e
}
The intent of such a submission is merely to inject a line like "Grade :=>>
100" into the output stream to let the grading component evaluate the submis-
sion with full points.
4 Mitigation analysis of cheat patterns
So, in our problem systematization phase, we identified four patterns of tricking
or cheating that can be observed in regular programming classes. These tricks
work because students know that a dumb automat checks their submissions. In
the following Sections 4.1, 4.2, 4.3, 4.4 and we will ask the question what can be
done to make APAAS more “intelligent” to prevent this kind of cheating?
4.1 What can be done to prevent overfitting?
Randomized test data make overfitted submissions ineffective. Therefore, our
general recommendation is to give a substantial fraction of points for randomized
test cases. However, to provide some control over randomized tests, these tests
must be pattern based to trigger expectable problems (e.g., off-by-one errors,
boundary cases) in student submissions. We refer to [17] for further details. E.g.
for string-based data, we gained promising results to generate random strings
merely by applying regular expressions [22] inversely. Section 5.1 explains how
randomization is used by JEdUnit to tackle the overfitting problem.
4.2 What can be done to avoid problem evasion?
Problem evasion cannot be detected by comparing the equality of input-output
results (black-box testing). To mitigate problem evasion, we need automated
code inspection approaches (white-box inspections). Submissions must be scanned
for unintended usage of language concepts like for and while loops. However,
this makes it necessary to apply parsers and makes the assignment specific eval-
uation logic much more complicated and time-intensive to program. To simplify
this work, we propose a selector-based model that selects nodes from an ab-
stract syntax tree (AST) of a compilation unit to detect and annotate such kind
of violations in a practical way. The approach works similarly like CSS selectors
selecting parts of a DOM-tree in a web context (see Section 5.2).
How programming students trick and what JEdUnit can do against it 13
4.3 What can be done to prevent redirection?
Interestingly problem evasion and redirection can be solved by the same mitiga-
tion approach. Similar to evasion cheats, submissions can be scanned for unin-
tended usage of language concepts, e.g. calls to classes containing the reference
logic that is used for testing. This white box inspection approach makes it pos-
sible to scan the submission with questionable calls like Solution.x() calls.
Additionally, we deny to make use of getClass() calls and the import of the
reflection package. Both would enable to formulate arbitrary indirections.
The techniques necessary to deny specific method calls and to deny the import
of packages will be explained in Section 5.2.
4.4 What can be done to avoid injection attacks?
In a perfect world, the submission should be executed in a context that by design
cannot access the grading logic. The student logic should be code that deserializes
input parameters from stdin, passes them to the submitted function, and seri-
alizes the output to stdout. The grading logic should serialize parameters, pipe
them into the wrapped student code, deserialize the stdout, and compare it with
the reference function’s output. However, this approach would deny making use
of common unit testing frameworks for evaluation although it would effectively
separate the submission logic and the evaluation logic in two different processes
(which would make most of the attack vectors in this setting ineffective). How-
ever, to the best of the author’s knowledge, no unit testing framework exists
that separates the test logic from the to be tested logic in different processes.
In the case of VPL, the shared use of the stdout (System.out) stream is
given. APAAS systems that separate the submission logic stdout stream from
the evaluation logic stdout stream might be not or less prone to injection attacks.
However, even for VPL, there are several options to handle this problem. E.g.,
we can prohibit making use of the System.exit() call to assure that submissions
could never stop the evaluation execution on their own. This prohibition can be
realized using a Java SecurityManager – it is likely to be more complicated for
other languages not providing a virtual machine built-in security concept. For
these systems, parser-based solutions (see Section 3.2) would be a viable option
(see Section 5.2).
A very effective way to separate the stdout/stderr streams is to redirect these
console streams to new (for the submission logic unaware) streams. This redirec-
tion is an astonishingly simple solution for the most severe identified problem.
It will be explained in Section 5.3.
5 JEdUnit
The Sections 4.1, 4.2, 4.3, and 4.4 showed that it is possible to mitigate identi-
fied cheat patterns using the strategies listed in Table 4. These insights flowed
into a unit testing framework called JEdUnit. JEdUnit has a specific focus on
14 Nane Kratzke
Table 4: Mapping of presented JEdUnit features to cheat patterns
Randomization Code Inspection Separation
Overfitting (63%) ←-
Evasion (30%) ←-
Redirection (6%) ←-
Injection (1%) ←-
educational aspects and strives to simplify automatic evaluation of (small) Java
programming assignments using Moodle and VPL. The framework has been
mainly developed for our purposes in programming classes at the L¨ubeck Uni-
versity of Applied Sciences. However, this framework might be helpful for other
programming instructors. Therefore, it is provided as open source.
Every JEdUnit evaluation is expressed in a Checks.java compilation unit
and usually relies on a reference implementation (which is by convention pro-
vided in a file called Solution.java) and a submission (which is by convention
provided in a file called Main.java). However, the conventions can be adapted
to assignment specific testing requirements.
Listing 1.10: Reference Solution expressed in JEdUnit
publi c c l a s s S o l u t i o n {
public int co u n tC h ar s ( char c , St r i ng s ) {
int n = 0;
for (char x : s .to Lo we rC as e () . to Cha rA rr ay ( )) {
if ( C h ar a ct e r . to L ow er C as e ( c ) = = x ) n+ + ;
}
return n;
}
}
Similar to JUnit, each test case is annotated with a @Test annotation (see
Listing 1.11). However, a JEdUnit @Test annotation takes two additional pa-
rameters:
–weight is a value between 0 and 1.0 that indicates how much this test case
contributes to the final grade of the assignment.
–description is a string briefly explaining the intent of the test case.
Listing 1.11: Test template expressed in JEdUnit
publi c c l a s s Checks extends Constraints {
@ Te st ( w e ig h t =0 . 25 , d e sc r ip t io n = " E xa m pl e ca l ls " )
publi c v o i d test_01_exampleCalls () { [...] }
How programming students trick and what JEdUnit can do against it 15
@ Te st ( w e ig h t =0 . 25 , d e sc r ip t io n = " B ou n da r y te st s " )
publi c v o i d t e s t_02 _ b o u n d a ry_c a s e s () { [. . . ] }
@ Te st ( w e ig h t =0 .5 , d e sc r i pt i on = " R a nd o mi z ed t e st s " )
publi c v o i d test_03_randomized_cases () { [...] }
}
A test case usually runs several test data tuples against the submitted solu-
tion and compares the result with the reference solution. A test data tuple can
be created using the t() method. Listing 1.12 shows this for the example calls
of our continuous example.
Listing 1.12: Example test case expressed in JEdUnit
@ Te st ( w e ig h t =0 . 25 , d e sc r ip t io n = " E xa m pl e ca l ls " )
publi c v o i d test_01_exampleCalls () {
test(
t ( ’a ’ ," A BC " ) , t( ’ A ’ ," a bc " ) ,
t ( ’X ’ ," a bc " ) , t( ’ x ’ ," X xY " )
). e ac h (
// c h e c k
d -> assertEquals(
S ol ut i on . c o un t Ch ar s ( d ._ 1 , d . _2 ) ,
M ai n . co un t Ch a rs ( d ._ 1 , d . _2 )
),
// explain
d - > f ( " c o u n tC h a r ( %s , % s ) sh ou l d r et u r n %s " ,
d ._ 1 , d ._ 2 , So lu t io n . co un t Ch ar s ( d. _1 , d. _ 2)
),
// o n e r r o r
d - > f ( " b u t re t u rn e d % s" ,
M ai n . c o u nt C h ar s ( d . _1 , d. _ 2 )
)
);
}
The each() method takes three parameters to run the evaluation on the test
data provided as tuples in the test() method.
–APredicate that checks whether the submitted solution returns the same
as the reference solution (indicated as correctness check // check in Listing
1.12).
–AFunction that explains the method call and reports the expected result
(indicated as the expected behavior // explain in Listing 1.12).
–AFunction that reports the actual result if the check predicate evaluates
to false (indicated as // on error in Listing 1.12).
The functions are used to provide meaningful feedback for students. To make
this straightforward JEdUnit provides the format f() method. f() is merely
16 Nane Kratzke
Table 5: Random generators provided by JEdUnit
Method Description
c() Random character ([a-zA-Z])
c(String regexp) Random character from a regular expression (first
char).
s(String... regexps) Random string from a sequence of regular expressions.
s(int min, int max) Random string between a minimum and maximum
length.
b() Random boolean value.
i() Random integer value.
i(int m) Random integer value [0, m].
i(int l, int u) Random integer value [l, u].
d() Random double value.
d(double m) Random double value [0, m[.
d(double l, double u) Random double value [l, u[.
<T> List<T> l(int l,
Supplier<T> g)
Random list with length l generated by g.
<T> List<T> l(int l, int u,
Supplier<T> g)
Random list with length in the range of [l, u] generated
by g.
a convenience wrapper for the String.format() method providing additional
formatting of feedback outputs that often confuses students. E.g., f() indicates
non printable characters like spaces, tabulators, and carriage returns by visible
representation characters like , , or . Additionally f() realizes visible
map, and list representations (and more).
The reader is referred to the Wiki1for a more detailed introduction to JE-
dUnit and further features like:
–Initializing assignments.
–Configuration of checkstyle (coding conventions).
–Making use of predefined code inspections.
–Checking complex object-oriented class structures automatically.
However, in the following Sections 5.1, 5.2, and 5.3 we focus mainly on how
JEdUnit makes use of randomization, code inspection, and stream separation to
mitigate observed overfitting, problem evasion, redirection, and injection cheat-
ing problems (see Table 4).
5.1 How does JEdUnit support randomization?
JEdUnit provides a set of random generators to mitigate overfitting problems.
These random generators enable to generate randomized test data in specified
1https://github.com/nkratzke/JEdUnit/wiki
How programming students trick and what JEdUnit can do against it 17
ranges and according to patterns to test explicitly common problem cases like
boundary cases, off-by-one errors, empty data structures, and more. Because
these generators are frequently used, they intentionally have short names (see
Table 5).
Using this set of random generators, we can quickly create test data – for
instance a random list of predefined terms.
Listing 1.13: Demonstration of a randomized list generator creating a list of
strings from a regular expression
Li s t < S t r ing > w ords = l(2 , 5, ( ) - >
s(" T hi s | is | j us t | a| s i ll y | e xa m pl e " )
);
The randomly generated lists have a length between two and five entries.
Possible resulting lists would be:
–["This", "a"]
–["silly", "is", "This"]
–["example", "example", "a", "a"]
–["This", "is", "just", "a", "example"]
The generators shown in Table 5 are designed to work seamlessly with the
test().each() pattern introduced in Listing 1.12. Listing 1.14 exemplifies a
randomized test case for our continuous example. It works with random strings
but generates test data to check intentionally for cases where the to be counted
character is placed in front, in the middle or at the end of a string to cover
frequent programming errors like off-by-one errors.
Listing 1.14: Example for a randomized test case expressed in JEdUnit
@ Te st ( w e ig h t =0 .5 , d e sc r i pt i on = " R a nd o mi z ed t e st s " )
publi c v o i d test_03_randomized_cases () {
// R e g e x p t o g e n e r a t e r a n d o m s t r i n g s
String r = " [ a - zA - Z ] { 5 , 17 } " ;
// P i ck a r andom c h a r a c t e r t o s e a r c h
char c = c ( );
test(
t(c , s ( c + " { 1 , 7} " , r , r ) ) , // First posit i o n
t(C , s ( r , r , c + " {1 , 7} " ) ) , // Las t p o s i t i o n
t(c , s ( r , c + " { 1 , 7 } " , r ) ) , / / M i d d l e p o s i t i o n
). e ac h ( c he c k , ex p la i n , o n Er r or ) ;
// c h eck , exp l a i n , o n E r r o r d e f i n e d a s i n L i s t i n g 1 . 1 2
}
Such kinds of test cases are not prone to overfitting because the test data is
randomly generated for every evaluation. A possible generated feedback for the
test case shown in Listing 1.14 would look like so:
18 Nane Kratzke
Table 6: Predefined code inspections provided by JEdUnit
Check option Default Description
CHECK IMPORTS true Checks if only cleared libraries are imported. java.util.* is allowed
per default.
CHECK COLLECTION INTERFACES true Checks if collections are accessed via their interfaces only (e.g. List
instead of LinkedList is used in method parameters, method return
types, variable and datafield declarators). To use concrete collection
classes like ArrayList instead of their interface List is penalized
by default, the same applies for Map. This check can be deactivated.
ALLOW LOOPS true for,while,do while,forEach() loops are allowed per default.
This can be deactivated and penalized. E.g., in case if methods
must be implemented recursively.
ALLOW METHODS true Methods are allowed per default. Can be deactivated and penalized
in special cases, like to enforce the usage of lambda functions.
ALLOW LAMBDAS true Lambdas are allowed per default. Can be deactivated and penalized
in special cases, like to enforce the usage of methods.
ALLOW INNER CLASSES false Inner classes are penalized by default. This check can be deactivated
if inner classes shall be allowed.
ALLOW DATAFIELDS false Checks if datafields (global variables) are used. This is penalized by
default. However, this check must be deactivated for object-oriented
contexts.
ALLOW CONSOLE OUTPUT false By default System.out.print() statements are not allowed and
penalized outside the scope of the main() method. This check can
be deactivated.
–[OK] countChars(’j’, "jUcEzCzODWWN") should return 1
–[FAILED] countChars(’j’, "zOdAqavJJkxjvrjj") should return 5 but
returned 3
–[OK] countChars(’j’, "SPAqlORwxjjjjRHIKCCWS") should return 4
Each evaluation will be run with random data but according to comparable
patterns. So, JEdUnit provides both: comparable evaluations and random test
cases that are not prone to overfitting.
5.2 How does JEdUnit support code inspection?
JEdUnit integrates the JavaParser [18] library to parse Java source code into an
abstract syntax tree. JEdUnit tests have full access to the JavaParser library and
can do arbitrary checks with this parser. However, this can become quickly very
complex. Therefore, JEdUnit tests can make use of a selector-based model that
selects nodes from an abstract syntax tree (AST) of a compilation unit to detect
and annotate such kind of violations in a practical way. The approach works
similarly like CSS selectors selecting parts of a DOM-tree in a web context.
The following examples demonstrate how to use this selector model on sub-
mission ASTs pragmatically for code inspections. The reader should inspect the
two evasion examples shown in Listing 1.4 and 1.6 that show typical evasion
patterns to avoid recursive or lambda/stream-based programming in Java.
To detect lambda evasion (see Listing 1.6), we can add the following in-
spection to a test case. It scans for lambda functions that make use of block
statements in their definition. To use blocks in lambda functions might indicate
How programming students trick and what JEdUnit can do against it 19
a kind of “problem evasion” in a submission – a student may try to evade from
lambda programming into more simple statement based programming (which is
likely not the intent of the assignment).
Listing 1.15: Example for a lambda evasion check in JEdUnit
@Inspection(description=" L a mb d a ev a si on in sp e ct i on " )
publi c v o i d assignmentInspections () {
p en a li z e (2 5 , " N o b lo c k s in l am b d as . " , () ->
inspect(" Main. java", as t -> as t . s el ec t ( L AM B DA )
. s el ec t ( B LO C K )
. a nn ot a te ( " N o b l o c k s in lambdas . " )
.exists ()
)
);
}
So, this inspection would effectively detect submissions like the one already
presented in Listing 1.6. To detect recursion evasion (see Listing 1.4) we simply
have to search for methods outside the main() method that make use of loop
statements.
Listing 1.16: Example to detect recursion evasion
i ns p ec t ( cu , as t - > a st . s e le c t ( ME TH O D , "[ n am e != m a in ] ")
. s el ec t ( FO R , FO RE A CH , W HI LE , D OW H IL E )
. a nn ot a te ( " n o lo o ps a l l ow e d " )
.exists ()
);
The same technique can be used to detect submissions that make use of
System.out.println() or System.exit() calls that might indicate injection
attacks.
Listing 1.17: Example to detect suspect method calls (used JEdUnit internally)
i ns pe c t (c u , as t - > as t . se le c t ( ME T HO D CA LL )
. f il te r ( c -> c. t o St r in g ( ). s t ar t sW i th ( " S ys t em . e xi t " )))
. a nn ot a te ( c -> " [ CH EA T ] Fo r b id d en c al l : " + c )
.exists ()
);
The reader may notice that these selector-based code inspections are quite
powerful and flexible to formalize and detect arbitrary violation patterns in
student code submissions. JEdUnit makes intensively use of this code inspection
technique and provides several predefined code inspections that can be activated
by config options. Table 6 lists some of these predefined inspections.
20 Nane Kratzke
5.3 How does JEdUnit supports separation of evaluation and
submission logic?
The main problem of injection attacks is that the submission logic and the eval-
uation logic make use of the same console streams (stdout/stderr). The problem
is that the grading component interprets this console output and this output
could be compromised by student submission’s outputs (intentionally or unin-
tentionally). JEdUnit solves this problem simply be redirecting the stdout/stderr
console streams to other streams. Simple methods like this do this redirection.
Listing 1.18: Redirection of stdout console stream
/**
* Creates a fi l e c a l l e d c o n s o l e . l o g t h a t s t o r e s a ll
* co n s o l e o u t p u t g e n e r a t e d by th e s u b m i t t e d l o g i c .
* Us e d to i s o l a t e t h e e v a l u a t i o n o u t p u t f r o m the
* su b m i t t e d l o g i c o u t p u t to p r e v e n t i n j e c t i o n a t t a c k s .
* @sinc e 0 . 2 . 1
*/
publi c v o i d redirectStdOut () throws Excep t i o n {
this. r e di r ec te d = new PrintStream(
C on fi g . S T D_ O UT _ RE D I RE C TI O N
);
System.setOut(this. r e di r ec t ed ) ;
}
The stdout/stderr streams are switched whenever a submission logic is called
and switched again when the submission logic returns. Because the submission
logic has no access to the evaluation library logic of JEdUnit, it can not figure
out the current state of the redirection and is therefore not capable of reversing
this redirection. In consequence, JEdUnit generates two stdout/stderr streams.
One for the evaluation logic that is passed to the VPL grading component,
and one for the submission logic that could be presented as feedback to the
user for debugging purposes. However, currently, JEdUnit is simply ignoring the
submission output streams. Figure 4 shows how JEdUnit changes the control
flow (right-hand side) in comparison to the standard VPL approach (left-hand
side).
This stream redirection effectively separates the submission logic streams
from the evaluation logic streams, and no stream injections can occur.
Additionally, JEdUnit prevents that student submissions can stop their ex-
ecution via System.exit() or similar calls to bypass the control flow of the
evaluation logic. This is done by prohibiting the reflection API2and calls to
System.exit() by code inspection means already explained in the previous Sec-
tion 5.2.
2The reflection API would enable to formulate arbitrary calling indirections that
could be not identified by code inspections.
How programming students trick and what JEdUnit can do against it 21
Fig. 4: Isolation of console streams in JEdUnit to prevent injection
6 Discussion of results and threats of validity
If we compare the number of detected cheats in the systematization phase with
the validation phase (see Table 3 in Section 3), we see the impact of applying
solutions like JEdUnit. In the systematization phase cheat detection has been
done manually and without notice to the students. In the validation phase cheat
detection has been done automatically by JEdUnit and been reported immedi-
ately as JEdUnit evaluation feedback to the students. The consequence was that
cheating occurred only in the first weeks. What is more, the cheating detected
in the validation phase has been only observed in intermediate outcomes and
not in the final submissions (while the cheating reported in the systematization
phase is cheating that made it to the final submissions).
So, we can conclude that JEdUnit effectively detects the four identified pat-
terns of cheating (overfitting, evasion, redirection, and injection). Cheating was
not even applied in intermediate results and came entirely to an end in the
second half of the semester. It was no effective mean anymore from a students
perspective.
However, we should consider and discuss the following threats of internal and
external validity [4] that apply to our study, and that might limit its conclusions.
Selection Bias and Maturation: We should consider that the target audience
of the systematization and validation phase differed a bit. In the systematiza-
tion phase, we worked with first-semester novice programmers of a computer
science and information technology and design study programme. In the vali-
dation phase, we worked with second-semester computer science students that
made some programming progress during their first semester. This increase in
expertise might be one reason why a general decrease in cheating was observable
in the second semester. However, that explains not the fact that cheating comes
entirely to an end in the second half of the semester. This decrease might be
22 Nane Kratzke
only explained by the fact that students learned that the grading automat “was
not so dumb” anymore.
Contextual Factors: This threat occurs due to specific conditions under which
research is conducted that might limit its generalisability. In this case, we were
bound to a Moodle-based APAAS solution. The study would not have been
possible outside this technical scope. We decided to work with VPL because it is
the only mature-enough open source solution for Moodle. Therefore, the study
should not be taken to conclude on all existing APAAS systems. Especially
JEdUnit is a Moodle/VPL specific outcome. However, it seems to be worth to
check existing APAAS solutions whether they are aware of the four identified
cheat-patterns (or attack vectors from a system security perspective) and what
can be transferred from JEdUnit into further APAAS solutions.
Hawthorne Effects: This threat occurs due to participants’ reactions to being
studied or tested. It alters their behaviour and therefore, the study results. We
can observe the Hawthorne effect quite obvious in Table 3. The reader should
compare the systematization phase (unaware of observation) and the validation
phase (noticing a more clever grading automat capable of detecting cheats). In
the validation phase cheating was drastically reduced, only occurred in inter-
mediate outcomes, and even comes entirely to an end in the second half of the
semester. JEdUnit made the grading component more“clever” and changed the
behaviour of students not to cheat.
Because of this internal feedback loop, the study should not be taken to draw
any conclusions on the quantitative aspects of cheating. Furthermore, the reader
should additionally take all the threat to validity discussion of the systematiza-
tion phase [11] into account in order to avoid drawing wrong conclusions.
7 Conclusion
Students trick – at least 15% of them. Maybe because grading components of
automated programming assessment systems can be tricked very easily. Even
first-year students are clever enough to do this. We identified recurring pat-
terns like overfitting,evasion,redirection, and even injection tricks. Most
APAAS solutions provide sandbox mechanisms and code plagiarism detection to
protect the host against hostile code or to detect “copy-paste” cheating. How-
ever, these measures do not prevent submissions like
System.out.println("Grade :=>> 100");
System.exit(0);
which would give full points (in a Moodle/VPL setting) regardless of the as-
signment complexity. These two lines (or the injection idea behind) are like an
“atomic bomb” for a grading component that is unaware to many programming
instructors. Current APAAS solutions can little do against it. This study aimed
to systematize such kind of recurring programming student cheat patterns and
How programming students trick and what JEdUnit can do against it 23
to search for mitigation options. To handle these kinds of cheats we do not only
need sandboxing nor code plagiarism detection (that almost all APAAS solutions
provide) but additionally means
1. to randomise test cases (contributes to 63% of all cheats, see Table 4),
2. to provide pragmatic code inspection techniques (contributes to 36%
of all cheats, see Table 4),
3. and to isolate the submission and the evaluation logic consequently
in separate processes (contributes to 1% of all cheats, see Table 4).
Therefore, this paper presented and evaluated the solution proposal JEdUnit
that handles these problems. According to our validation in three courses with
over 260 students and more than 3.600 submissions of programming assignments,
JEdUnit can make grading components much more “intelligent”. We showed
how to make overfitting inefficient, how to detect evasion and redirection, and
how to deny injection cheat patterns. When students learn that the grading
component is no dumb automat anymore, cheating decreases immediately. In
our case, cheating comes even entirely to an end.
However, as far as the author oversees the APAAS landscape, exactly these
mentioned features are only incompletely provided by current APAAS solutions.
Most APAAS solutions focus on sandboxing and code plagiarism detection but
oversee the cheating cleverness of students. JEdUnit is not perfect as well, but it
focuses on the cheating cleverness of students and is meant as fortifying add-on
to existing APAAS solutions like Moodle/VPL. However, its biggest shortcoming
might be its current technological dependence on Moodle/VPL and Java.
Further research should focus on aspects of how the JEdUnit lessons learned
can be transferred into further APAAS solutions. To do this, we have to ask how
and which of the JEdUnit principles can be made programming language and
APAAS solution agnostic? Therefore, JEdUnits working state can be inspected
on GitHub3to foster the adaption of the ideas and concepts behind JEdUnit.
References
1. Ala-Mutka, K.M.: A survey of automated assessment approaches for pro-
gramming assignments. Computer Science Education 15(2), 83–102 (2005).
https://doi.org/10.1080/08993400500150747
2. Alraimi, K.M., Zo, H., Ciganek, A.P.: Understanding the moocs continu-
ance: The role of openness and reputation. Computers & Education 80,
28 – 38 (2015). https://doi.org/https://doi.org/10.1016/j.compedu.2014.08.006,
http://www.sciencedirect.com/science/article/pii/S0360131514001791
3. Caiza, J.C., Alamo Ramiro, J.M.d.: Automatic Grading: Review of Tools and Im-
plementations. In: Proc. of 7th Int. Technology, Education and Development Con-
ference (INTED2013) (2013)
4. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-experimental Designs for
Research. Houghton Mifflin Company (2003), reprint
3https://github.com/nkratzke/JEdUnit
24 Nane Kratzke
5. Douce, C., Livingstone, D., Orwell, J.: Automatic test-based as-
sessment of programming: A review. J. Educ. Resour. Com-
put. 5(3) (Sep 2005). https://doi.org/10.1145/1163405.1163409,
http://doi.acm.org/10.1145/1163405.1163409
6. Gupta, S., Gupta, B.B.: Cross-site scripting (xss) attacks and defense
mechanisms: classification and state-of-the-art. International Journal of Sys-
tem Assurance Engineering and Management 8(1), 512–530 (Jan 2017).
https://doi.org/10.1007/s13198-015-0376-0, https://doi.org/10.1007/s13198-015-
0376-0
7. Halfond, W.G.J., Orso, A.: Amnesia: Analysis and monitoring for neutral-
izing sql-injection attacks. In: Proceedings of the 20th IEEE/ACM Interna-
tional Conference on Automated Software Engineering. pp. 174–183. ASE ’05,
ACM, New York, NY, USA (2005). https://doi.org/10.1145/1101908.1101935,
http://doi.acm.org/10.1145/1101908.1101935
8. Hunter, J.D.: Matplotlib: A 2d graphics environment. Computing In Science &
Engineering 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
9. Ihantola, P., Ahoniemi, T., Karavirta, V., Sepp¨al¨a, O.: Review of re-
cent systems for automatic assessment of programming assignments.
In: Proceedings of the 10th Koli Calling International Conference on
Computing Education Research. pp. 86–93. Koli Calling ’10, ACM,
New York, NY, USA (2010). https://doi.org/10.1145/1930464.1930480,
http://doi.acm.org/10.1145/1930464.1930480
10. Kluyver, T., Ragan-Kelley, B., P´erez, F., Granger, B., Bussonnier, M., Frederic,
J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla,
S., Willing, C.: Jupyter Notebooks – a publishing format for reproducible compu-
tational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and Power in
Academic Publishing: Players, Agents and Agendas. pp. 87 – 90. IOS Press (2016)
11. Kratzke, N.: Smart Like a Fox: How clever students trick dump programming
assignment assessment systems. In: Proc. of the 11th Int. Conf. on Computer
Supported Education (CSEDU 2019) (May 2019), best paper candidate
12. Oliphant, T.: A Guide to NumPy. Trelgol Publishing (2006)
13. del Pino, J.C.R., Rubio-Royo, E., Hern´andez-Figueroa, Z.J.: A Virtual Program-
ming Lab for Moodle with automatic assessment and anti-plagiarism features. In:
Proc. of the 2012 Int. Conf. on e-Learning, e-Business, Enterprise Information
Systems, and e-Government. (2012)
14. Pomerol, J.C., Epelboin, Y., Thoury, C.: What is a MOOC?, chap. 1,
pp. 1–17. Wiley-Blackwell (2015). https://doi.org/10.1002/9781119081364.ch1,
https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119081364.ch1
15. Ray, D., Ligatti, J.: Defining code-injection attacks. SIGPLAN Not.
47(1), 179–190 (Jan 2012). https://doi.org/10.1145/2103621.2103678,
http://doi.acm.org/10.1145/2103621.2103678
16. Rodr´ıguez, J., Rubio-Royo, E., Hern´andez, Z.: Fighting plagiarism: Metrics and
methods to measure and find similarities among source code of computer programs
in vpl. In: EDULEARN11 Proceedings. pp. 4339–4346. 3rd Int. Conf. on Education
and New Learning Technologies, IATED (4-6 July, 2011 2011)
17. Romli, R., Mahzan, N., Mahmod, M., Omar, M.: Test data generation ap-
proaches for structural testing and automatic programming assessment: A sys-
tematic literature review. Advanced Science Letters 23(5), 3984–3989 (2017).
https://doi.org/10.1166/asl.2017.8294
18. Smith, N., van Bruggen, D., Tomassetti, F.: JavaParser: visited. Leanpub (2018)
How programming students trick and what JEdUnit can do against it 25
19. Staubitz, T., Klement, H., Renz, J., Teusner, R., Meinel, C.: Towards
practical programming exercises and automated assessment in massive open
online courses. In: 2015 IEEE International Conference on Teaching, As-
sessment, and Learning for Engineering (TALE). pp. 23–30 (Dec 2015).
https://doi.org/10.1109/TALE.2015.7386010
20. Su, Z., Wassermann, G.: The essence of command injec-
tion attacks in web applications. SIGPLAN Not. 41(1),
372–382 (Jan 2006). https://doi.org/10.1145/1111320.1111070,
http://doi.acm.org/10.1145/1111320.1111070
21. Thi´ebaut, D.: Automatic evaluation of computer programs using moodle’s virtual
programming lab (vpl) plug-in. J. Comput. Sci. Coll. 30(6), 145–151 (Jun 2015),
http://dl.acm.org/citation.cfm?id=2753024.2753053
22. Thompson, K.: Programming techniques: Regular expression search algorithm.
Communications of the ACM 11(6), 419–422 (1968)