Content uploaded by Nane Kratzke
Author content
All content in this area was uploaded by Nane Kratzke on Apr 12, 2024
Content may be subject to copyright.
How Programming Students Trick
and What JEdUnit Can Do Against It
Nane Kratzke(B
)
L¨ubeck University of Applied Sciences, M¨onkhofer Weg 239, 23562 L¨ubeck, Germany
nane.kratzke@th-luebeck.de
Abstract. According to our data, about 15% of programming students
trick if they are aware that only a “dumb” robot evaluates their pro-
gramming assignments unattended by programming experts. Especially
in large-scale formats like MOOCs, this might become a question because
to trick current automated assignment assessment systems (APAAS) is
astonishingly easy and the question arises whether unattended grading
components grade the capability to program or to trick. This study ana-
lyzed what kind of tricks students apply beyond the well-known “copy-
paste” code plagiarism to derive possible mitigation options. Therefore,
this study analyzed student cheat patterns that occurred in two program-
ming courses and developed a unit testing framework JEdUnit as a solu-
tion proposal that intentionally targets such tricky educational aspects
of programming. The validation phase validated JEdUnit in another pro-
gramming course. This study identified and analyzed four recurring cheat
patterns (overfitting, evasion, redirection, and injection) that hardly
occur in “normal” software development and are not aware to normal
unit testing frameworks that are frequently used to test the correct-
ness of student submissions. Therefore, the concept of well-known unit
testing frameworks was extended by adding three “countermeasures”:
randomization, code inspection, separation. The validation showed that
JEdUnit detected these patterns and in consequence, reduced cheating
entirely to zero. From a students perspective, JEdUnit makes the grad-
ing component more intelligent, and cheating does not pay-off anymore.
This Chapter explains the cheat patterns and what features of JEdUnit
mitigate them by a continuous example.
Keywords: Automatic ·Assessment ·Programming ·Course ·
Education APAAS ·MOOC ·Moodle ·VPL ·Trick ·Cheat ·
Pattern ·Mitigation
1 Introduction
In a digitized world, more and more experts are needed with at least some
basic programming skills. Programming might even evolve into a foundational
skill similar to reading, writing, and calculating. Therefore, the course sizes of
c
Springer Nature Switzerland AG 2020
H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 1–25, 2020.
https://doi.org/10.1007/978-3-030-58459-7_1
2N.Kratzke
university and college programming courses are steadily increasing. Even mas-
sive open online courses [14] are used more and more systematically to convey
necessary programming capabilities to students of different disciplines [19]. The
coursework consists of programming assignments that need to be assessed. Since
the submitted assignments are executable programs with a formal structure,
they are highly suited to be assessed automatically. In consequence, plenty of
automated programming assignment assessment systems (APAAS) evolved. We
refer to [1–3,5,9,17] for an overview of such tools.
Previous research [11] showed how astonishingly simple it is for students to
trick automated programming assignment assessment systems. It is often over-
looked that APAAS solutions are systems that execute injected code (student
submissions) and code injection is known as a severe threat from a security point
of view [20]. We refer to [7,15], and [6] for an overview of such kind of attacks.
Of course, such code injection vulnerabilities are considered by current solu-
tions. However, in previous research [11], it was astonishing to see that current
APAAS solutions sometimes overlook the cheating cleverness of students. On the
one hand, APAAS solutions protect the host system via sandbox mechanisms,
and APAAS systems put much effort in sophisticated plagiarism detection and
authorship control of student submissions [13,16]. On the other, the grading com-
ponent can be cheated in various – sometimes ridiculously simple – ways making
these solutions highly suspect for (semi-)automated and unattended program-
ming examinations that contribute to certificate a certain level of programming
expertise. Previous research [11] identified at least four simple cheat patterns:
– Overfitting
– Evasion
– Redirection
– Injection
Moreover, it strived to raise a general problem awareness but did not focus
on solutions to mitigate these patterns. To propose solutions on how to mitigate
these identified cheat patterns is the primary intent and contribution of this
Chapter. We propose to use the following three techniques:
– Randomization of test cases
– Pragmatic code inspections
– Separation of student and evaluation logic
These techniques mitigate the presented patterns and demonstrate the suit-
ability for the APAAS solution Moodle/VPL and the programming language
Java. Nevertheless, the principles are transferable to other APAAS solutions
and programming languages and therefore of broader interest and not limited to
Moodle/VPL and Java.
Consequently, the remainder of this paper is outlined as follows. Section 2
presents the methodology that has been used to identify and categorize stu-
dent cheat-patterns and to validate appropriate “countermeasures”. Section3
will explain the identified cheat-patterns. Section4presents a mitigation analysis
How Programming Students Trick and What JEdUnit Can Do Against It 3
and shows that the identified cheat patterns can be addressed by three “coun-
termeasures” that should be considered by every APAAS solution. Section 5
explains how these insights have been considered in the development of a unit
testing framework (JEdUnit) that focuses intentionally on educational aspects
and considers the aspects mentioned above consequently. We will discuss our
results of JEdUnit in Sect. 6and provide some guidelines on the generalizability
and limitations of this research. Finally, we conclude our findings in Sect.7.
Fig. 1. Research methodology.
2 Methodology
Figure 1presents the overall research approach that compromised two phases:
Problem systematization and solution proposal validation.
2.1 Problem Systematization
For the initial problem systematization, two first semester programming Java
courses in the winter semester 2018/19 (see Table 1) have been systematically
evaluated. Both courses were used to search for student submissions that inten-
tionally trick the grading component of APAAS solutions. Table 1provides a
summarized overview of the course design. All assignments were automatically
evaluated by the VPL Moodle plugin (version 3.3.3) following the general rec-
ommendations described by [21]. For more details, the reader is referred to [11].
Figure 2shows an exemplifying VPL screenshot from a students perspective.
4N.Kratzke
Fig. 2. VPL screenshot (on the right evaluation report presented to students) [11].
Table 1. Courses used for problem systematization and solution validation.
Systematization Validation
Prog I (CS) Prog I (ITD) Prog II (CS)
Students 113 79 73
Assignments (total) 29 20 8
Number of bunches 11 6 7
Assignments per bunch (avg) 3 3 1
Time for a bunch (weeks) 1 2 2
Groups 7 6 4
Students per group (avg) 18 12 18
Student/advisor ratio (avg) 6 6 6
To minimize Hawthorne and Experimenter effects [4] neither the students
nor the advisers in the practical programming courses were aware that student
submissions were analyzed to deduce cheating patterns. Even if cheating was
detected, this had no consequences for the students. It was not even communi-
cated to the student or the advisers.
Furthermore, students were not aware that the version history of their sub-
missions and therefore even intermediate cheating experiments (that did not
make it to the final submission) were logged. However, not every submission
was inspected for understandable effort reasons. Therefore, only significant sub-
mission samples (see Table 2) were investigated to search systematically for
cheat-patterns.
Table 3summarizes the results quantitatively. Within these six samples,
cheat and trick patterns were identified mainly by a manual but a script-
supported observation. VPL submissions were downloaded from Moodle and
analyzed weekly. We developed a Jupyter-based [10] quantitative analysis and
How Programming Students Trick and What JEdUnit Can Do Against It 5
submission data model for this dataset. Each student submission was represented
as an object containing its version and grading history that references its student
submitter and its corresponding study programme. The analytical script and
data model made use of the well known Python libraries statistics,NumPy [12],
matplotlib [8], and the Javaparser library [18]. It was used to identify the num-
ber of submissions and evaluations, points per submission versions, timestamps
per submission version, occurrences of unusual terms, and more. Based on this
quantitative data, the mentioned samples (S1–S5) were selected automatically
(or randomly in case of S6). Additionally, the source codes of the sample sub-
missions were exported weekly as an archived PDF document. However, the
scanning for cheat-patterns was done manually within these documents.
Table 2. Weekly analyzed samples (systematization phase).
Description Rationale
S1 TOP 10 of submissions with many
triggered evaluations
Parameter optimization could
cause plenty of evaluations
S2 TOP 10 of submission with many
versions
Cheating experiments could cause
plenty of versions
S3 TOP 10 of submissions with
astonishingly less average points
across all triggered evaluations but
full points in the final submission
Cheating could cause such point
boosts
S4 Submissions with unusual (above
95% percentile) many condition
related terms like if,return,
switch,case, and so on
Parameter optimization could
cause unuasal many condition
related terms
S5 Submissions with unusual terms
like System.exit,
System.getProperties,:=>> that
would stop program execution or
have a special meaning in VPL or
Java but are unlikely to
contribute to a problem solution
APAAS attacks could cause such
unusual terms
S6 Ten random submissions To cover unintended observation
aspects
2.2 Solution Proposal Validation
Based on these elicited cheat patterns, corresponding mitigation options have
been derived. Three of them (randomization, code inspection, and submission/e-
valuation logic separation) have been implemented in a unit testing frame-
work called JEdUnit as a solution proposal to mitigate the identified problems.
JEdUnit has been validated using a Programming II course for computer science
students in the summer semester 2019. The course has been given analogously
6N.Kratzke
to the systematization phase except that JEdUnit has been applied. The search
for cheats have been conducted similarly, except the fact, that we inspected
every submission because of smaller course size and less (but more extensive
and more complex) assignments. The mitigation options and JEdUnit, as well
as the validation results, are presented in the following Sections.
3 Cheat-Patterns to Consider
Some basic Java programming knowledge must be assumed throughout this
paper. The continuous example assignment for this Chapter shall be the fol-
lowing task. A method countChar() has to be programmed that counts the
occurrence of a specific character cin a given String s(not case-sensitive).
The following example calls are provided for a better understanding of the
intended functionality.
–countChar(‘a’, "Abc") →1
–countChar(‘A’, "abc") →1
–countChar(‘x’, "ABC") →0
–countChar(‘!’, "!!!") →3
A reference solution for our “count chars in a string” problem might be the
following implementation of countChar().
Listing 1.1. Reference solution (continuous example).
int countChar(char c, String s) {
s = s.toLowerCase ();
c = Character.toLowerCase(c);
int i=0;
for (char x : s.toCharArray ()) {
if (x == c) i++;
}
return i;
}
According to our data, most students strive to find a solution that fits the
scope and intent of the assignment (see Table 3and Fig. 3). However, in the
systematization phase, a minority of students (approximately 15%) make use of
the fact that a “dumb automat” grades. Accordingly, we observed the following
cheating patterns that differ significantly from the intended reference solution
above (see Fig. 3):
– Overfitting solutions (63%)
– Redirection to reference solutions (6%)
– Problem evasion (30%)
– Injection (1%)
How Programming Students Trick and What JEdUnit Can Do Against It 7
Table 3. Detected cheats.
Fig. 3. Observed cheat-pattern frequency (without application of JEdUnit) [11].
Especially overfitting and evasion tricks are “poor-mans’ weapons” often used
by novice programmers as a last resort to solve a problem. Much more alarm-
ing redirection and injection cheats occurred only in rare cases (less than 10%).
However, how do these tricks and cheats look like? How severe are they? More-
8N.Kratzke
over, what can be done against it? We will investigate these questions in the
following paragraphs.
3.1 Overfitting Tricks
Overfitted solutions strive to get a maximum of points for grading but do not
strive to solve the given problem in a general way. A notable example of an
overfitted solution would be Listing 1.2.
Listing 1.2. Overfitting solution.
int countChar(char c, String s) {
if (c == ’a’ && s.equals ("Abc")) return 1;
if (c == ’A’ && s.equals ("abc")) return 1;
if (c == ’x’ && s.equals ("ABC")) return 0;
if (c == ’!’ && s.equals ("!!!")) return 3;
// [...]
if (c == ’x’ && s.equals ("X")) return 1;
return 42;
}
This solution maps merely the example input parameters to the expected
output values. The solution is completely useless outside the scope of the test
cases.
3.2 Problem Evasion Tricks
Another trick pattern is to evade a given problem statement. According to our
experiences, this pattern occurs mainly in the context of more sophisticated
and formal programming techniques like recursive programming or functional
programming styles with lambda functions.
So, let us now assume that the assignment is still to implement a
countChar() method, but this method should be implemented recursively.A
reference solution might look like in Listing 1.3 (we do not consider performance
aspects due to tail recursion):
Listing 1.3. Recursive reference solution.
int countChar(char c, String s) {
s = s.toLowerCase ();
c = Character.toLowerCase(c);
if (s.isEmpty ()) return 0;
char head = s.charAt(0);
String rest = s.substring(1);
int n = head == c?1:0;
return n + countChar(c, rest);
}
How Programming Students Trick and What JEdUnit Can Do Against It 9
However, sometimes, student submissions only pretend to be recursive with-
out being it. Listing 1.4 is a notable example.
Listing 1.4. Problem evasing solution.
int countChar(char c, String s) {
if (s.isEmpty ()) return 0;
return countChar(c, s, 0);
}
int countChar(char c, String s, int i){
for (char x : s.toCharArray ()) {
if (x == c) i ++;
}
return i;
}
Although countChar() is calling (an overloaded version of) countChar() which
looks recursively, the overloaded version of countChar() makes use of a for-loop
and is therefore implemented in a fully imperative style.
The same pattern can be observed if an assignment requests functional pro-
gramming with lambda functions. A lambda-based solution could look like in
Listing 1.5.
Listing 1.5. Lambda reference solution.
(c, s) -> Stream.of(s.toCharArray())
.filter(x -> x == c)
.count ();
However, students take refuge in familiar programming concepts like loops. Very
often, submissions like in Listing 1.6 are observable:
Listing 1.6. Lambda problem evasion.
(c, s) -> {
int i=0;
for (char x : s.toCharArray ()) {
if (x == c) i ++;
}
return i;
};
The (c, s) ->{[...] };seems functional on the first-hand. But, if we look
at the implementation, it is only an imperative for loop embedded in a functional
looking context.
The problem here is, that evaluation components just looking on input-
output parameter correctness will not detect these kinds of programming style
evasions. The just recursive- or functional-looking solutions will generate the
correct results. Nevertheless, the intent of such kind of assignments is not just
to foster correct solutions but although to train specific styles of programming.
10 N. Kratzke
3.3 Redirection Cheats
Another shortcoming of APAAS solutions can be compiler error messages that
can reveal details of the evaluation logic. In the case of VPL, an evaluation is
processed according to the following steps.
1. The submission is compiled and linked to the evaluation logic.
2. The compiled result is executed to run the checks.
3. The check results are printed in an APAAS specific notation on the console
(standard-out).
4. This output is interpreted by the APAAS solution to run the automatic grad-
ing and present a kind of feedback to the submitter.
This process is straightforward and provides the benefit that evaluation com-
ponents can handle almost all programming languages. If one of the steps fails,
an error message is generated and returned to the submitter as feedback. This
failing involves typically to return the compiler error message. That can be prob-
lematic because these compiler error messages may provide unexpected cheating
opportunities.
Let us remember. The assignment was to program a method countChar().
Let us further assume that a student makes a small spelling error like to name
the method instead of countChar() – so just a trailing sis
added. That is a general programming error that happens fast (see Listing 1.7).
Listing 1.7. A slightly misspelled submission.
int countCharS(char c, String s) {
int i=0;
for (char x : s.toCharArray ()) {
if (x == c) i++;
}
return i;
}
If this submission would be submitted and evaluated by an APAAS solution,
this submission would likely not pass the first compile step due to a simple
spelling error. What is returned is a list of compiler error messages like this one
here that shows the problem:
Checks.java:40: error: cannot find symbol
Submission.>>countChar<<(‘a’, "Abc") ==
Solution.countChar(‘a’, "Abc")
The compiler message provides useful hints to correct a misspelt method
name, but it also reveals that a method (Solution.countChar()) exists to check
the student submission. The reference solution can be assumed to be correct.
So, compiler error messages can reveal a reference solution method that could
be called directly. A student can correct the naming and redirects the submitted
How Programming Students Trick and What JEdUnit Can Do Against It 11
method directly to the method that is used for grading. Doing that, the student
would let the evaluation component evaluate itself, which will provide very likely
full points. A notable example would be Listing 1.8.
Listing 1.8. Redirection submission.
int countChar(char c, String s) {
return Solution.countChar(c, s);
}
This is categorized as a redirection cheat. Students can gain insights sys-
tematically into the evaluation logic by submitting non-compilable submissions
intentionally.
3.4 Point Injection Cheats
All previous cheat-patterns focused the compile, or the execution step, and try
to formulate a smart submission that tricks the evaluation component and its
checks. Instead of this, injection cheats target intentionally the grading compo-
nent. Injection cheats require in-depth knowledge about what specific APAAS
solution (e.g., VPL) is used, and knowledge about the internals and details how
the APAAS solutions generate intermediate outcomes to calculate a final grade.
We explain this “attack” by the example of VPL. However, the attack can
be easily adapted to other APAAS tools. VPL relies on an evaluation script that
triggers the evaluation logic. The evaluation logic has to write results directly
on the console (standard-out). The grading component parses and searches for
lines that start with specific prefixes like
–Grade :=>> to give points
–Comment :=>> for hints and remarks that should be presented to the sub-
mitter as feedback.
VPL assumes that students are not aware of this knowledge. It is furthermore
(somehow inherently) assumed that student submissions do not write to the
console (just the evaluation logic should do that) – but it is possible for submis-
sions to place arbitrary output on the console and is not prohibited by the Jails
server. So, these assumptions are a fragile defence. A quick internet search with
the search terms "grade VPL" will turn up the documentation of VPL explain-
ing how the grading component is working under the hood. So, submissions like
Listing 1.9 are possible and executable.
Listing 1.9. Injection submission.
int countChar(char c, String s) {
System.out.print ("Grade :=>> 100");
System.exit(0);
return 0; // for compiler silence
}
12 N. Kratzke
The intent of such a submission is merely to inject a line like "Grade :=>>
100"’ into the output stream to let the grading component evaluate the sub-
mission with full points.
4 Mitigation Analysis of Cheat Patterns
So, in our problem systematization phase, we identified four patterns of tricking
or cheating that can be observed in regular programming classes. These tricks
work because students know that a dumb automat checks their submissions. In
the following Sects. 4.1,4.2,4.3,4.4 and we will ask the question what can be
done to make APAAS more “intelligent” to prevent this kind of cheating?
4.1 What Can Be Done to Prevent Overfitting?
Randomized test data make overfitted submissions ineffective. Therefore, our
general recommendation is to give a substantial fraction of points for randomized
test cases. However, to provide some control over randomized tests, these tests
must be pattern based to trigger expectable problems (e.g., off-by-one errors,
boundary cases) in student submissions. We refer to [17] for further details. E.g.
for string-based data, we gained promising results to generate random strings
merely by applying regular expressions [22] inversely. Section 5.1 explains how
randomization is used by JEdUnit to tackle the overfitting problem.
4.2 What Can Be Done to Avoid Problem Evasion?
Problem evasion cannot be detected by comparing the equality of input-
output results (black-box testing). To mitigate problem evasion, we need auto-
mated code inspection approaches (white-box inspections). Submissions must
be scanned for unintended usage of language concepts like for and while loops.
However, this makes it necessary to apply parsers and makes the assignment
specific evaluation logic much more complicated and time-intensive to program.
To simplify this work, we propose a selector-based model that selects nodes from
an abstract syntax tree (AST) of a compilation unit to detect and annotate such
kind of violations in a practical way. The approach works similarly like CSS
selectors selecting parts of a DOM-tree in a web context (see Sect.5.2).
4.3 What Can Be Done to Prevent Redirec ti on ?
Interestingly problem evasion and redirection can be solved by the same mit-
igation approach. Similar to evasion cheats, submissions can be scanned for
unintended usage of language concepts, e.g. calls to classes containing the refer-
ence logic that is used for testing. This white box inspection approach makes it
How Programming Students Trick and What JEdUnit Can Do Against It 13
possible to scan the submission with questionable calls like Solution.x() calls.
Additionally, we deny to make use of getClass() calls and the import of the
reflection package. Both would enable to formulate arbitrary indirections.
The techniques necessary to deny specific method calls and to deny the import
of packages will be explained in Sect. 5.2.
4.4 What Can Be Done to Avoid Injection Attacks?
In a perfect world, the submission should be executed in a context that by design
cannot access the grading logic. The student logic should be code that deserializes
input parameters from stdin, passes them to the submitted function, and seri-
alizes the output to stdout. The grading logic should serialize parameters, pipe
them into the wrapped student code, deserialize the stdout, and compare it with
the reference function’s output. However, this approach would deny making use
of common unit testing frameworks for evaluation although it would effectively
separate the submission logic and the evaluation logic in two different processes
(which would make most of the attack vectors in this setting ineffective). How-
ever, to the best of the author’s knowledge, no unit testing framework exists
that separates the test logic from the to be tested logic in different processes.
In the case of VPL, the shared use of the stdout (System.out) stream is
given. APAAS systems that separate the submission logic stdout stream from
the evaluation logic stdout stream might be not or less prone to injection attacks.
However, even for VPL, there are several options to handle this problem. E.g.,
we can prohibit making use of the System.exit() call to assure that submissions
could never stop the evaluation execution on their own. This prohibition can be
realized using a Java SecurityManager – it is likely to be more complicated for
other languages not providing a virtual machine built-in security concept. For
these systems, parser-based solutions (see Sect. 3.2) would be a viable option
(see Sect. 5.2).
A very effective way to separate the stdout/stderr streams is to redirect these
console streams to new (for the submission logic unaware) streams. This redirec-
tion is an astonishingly simple solution for the most severe identified problem.
It will be explained in Sect. 5.3.
Table 4. Mapping of presented JEdUnit features to cheat patterns.
Randomization Code Inspection Separation
Overfitting (63%) ←
Evasion (30%) ←
Redirection (6%) ←
Injection (1%) ←
14 N. Kratzke
5 JEdUnit
The Sects. 4.1,4.2,4.3,and4.4 showed that it is possible to mitigate identi-
fied cheat patterns using the strategies listed in Table 4. These insights flowed
into a unit testing framework called JEdUnit. JEdUnit has a specific focus on
educational aspects and strives to simplify automatic evaluation of (small) Java
programming assignments using Moodle and VPL. The framework has been
mainly developed for our purposes in programming classes at the L¨ubeck Uni-
versity of Applied Sciences. However, this framework might be helpful for other
programming instructors. Therefore, it is provided as open source.
Every JEdUnit evaluation is expressed in a Checks.java compilation unit
and usually relies on a reference implementation (which is by convention pro-
vided in a file called Solution.java) and a submission (which is by convention
provided in a file called Main.java). However, the conventions can be adapted
to assignment specific testing requirements.
Listing 1.10. Reference Solution expressed in JEdUnit.
public class Solution {
public int countChars(char c, String s) {
int n=0;
for (char x : s.toLowerCase ().toCharArray()) {
if (Character.toLowerCase(c) == x) n++;
}
return n;
}
}
Similar to JUnit, each test case is annotated with a @Test annotation (see
Listing 1.11). However, a JEdUnit @Test annotation takes two additional param-
eters:
–weight is a value between 0 and 1.0 that indicates how much this test case
contributes to the final grade of the assignment.
–description is a string briefly explaining the intent of the test case.
Listing 1.11. Test template expressed in JEdUnit.
public class Checks extends Constraints {
@Test(weight=0.25, description ="Example calls ")
public void test_01_exampleCalls() { [...] }
@Test(weight=0.25, description ="Boundary tests ")
public void test_02_boundary_cases() { [...] }
@Test(weight=0.5 , description="Randomized tests ")
public void test_03_randomized_cases() { [...] }
}
How Programming Students Trick and What JEdUnit Can Do Against It 15
A test case usually runs several test data tuples against the submitted solu-
tion and compares the result with the reference solution. A test data tuple can
be created using the t() method. Listing 1.12 shows this for the example calls
of our continuous example.
Listing 1.12. Example test case expressed in JEdUnit.
@Test(weight=0.25, description ="Example calls ")
public void test_01_exampleCalls() {
test(
t(’a’," ABC"), t(’A’," abc"),
t(’X’," abc"), t(’x’," XxY")
).each(
// check
d -> assertEquals(
Solution.countChars(d._1, d._2),
Main.countChars(d._1, d._2)
),
// explain
d->f("countChar (%s, %s) should return %s",
d._1, d._2, Solution. countChars(d._1, d._2)
),
// on error
d->f("but returned %s",
Main.countChars(d._1, d._2)
)
);
}
The each() method takes three parameters to run the evaluation on the test
data provided as tuples in the test() method.
–APredicate that checks whether the submitted solution returns the same
as the reference solution (indicated as correctness check in Listing
1.12).
–AFunction that explains the method call and reports the expected result
(indicated as the expected behavior in Listing 1.12).
–AFunction that reports the actual result if the check predicate evaluates to
false (indicated as in Listing 1.12).
The functions are used to provide meaningful feedback for students. To make
this straightforward JEdUnit provides the format f() method. f() is merely
a convenience wrapper for the String.format() method providing additional
formatting of feedback outputs that often confuses students. E.g., f() indicates
non printable characters like spaces, tabulators, and carriage returns by visible
representation characters like ,,or . Additionally f() realizes visible
map, and list representations (and more).
16 N. Kratzke
Table 5. Random generators provided by JEdUnit.
Method Description
c() Random character ([a-zA-Z])
c(String regexp) Random character from a regular
expression (first char)
s(String... regexps) Random string from a sequence of
regular expressions
s(int min, int max) Random string between a minimum
and maximum length
b() Random boolean value
i() Random integer value
i(int m) Random integer value [0, m]
i(int l, int u) Random integer value [l, u]
d() Random double value
d(double m) Random double value [0, m[
d(double l, double u) Random double value [l, u[
<T> List<T> l(int l, Supplier<T>
g)
Random list with length l generated by
g
<T> List<T> l(int l, int u,
Supplier<T> g)
Random list with length in the range
of [l, u] generated by g
The reader is referred to the Wiki1for a more detailed introduction to
JEdUnit and further features like:
– Initializing assignments.
– Configuration of checkstyle (coding conventions).
– Making use of predefined code inspections.
– Checking complex object-oriented class structures automatically.
However, in the following Sects. 5.1,5.2,and5.3 we focus mainly on how
JEdUnit makes use of randomization, code inspection, and stream separation to
mitigate observed overfitting, problem evasion, redirection, and injection cheat-
ing problems (see Table 4).
5.1 How Does JEdUnit Support Randomization?
JEdUnit provides a set of random generators to mitigate overfitting problems.
These random generators enable to generate randomized test data in specified
ranges and according to patterns to test explicitly common problem cases like
boundary cases, off-by-one errors, empty data structures, and more. Because
1https://github.com/nkratzke/JEdUnit/wiki.
How Programming Students Trick and What JEdUnit Can Do Against It 17
these generators are frequently used, they intentionally have short names (see
Table 5).
Using this set of random generators, we can quickly create test data – for
instance a random list of predefined terms.
Listing 1.13. Demonstration of a randomized list generator creating a list of strings
from a regular expression.
List <String > words = l(2, 5, () ->
s("This|is| just|a|silly|example")
);
The randomly generated lists have a length between two and five entries.
Possible resulting lists would be:
–["This", "a"]
–["silly", "is", "This"]
–["example", "example", "a", "a"]
–["This", "is", "just", "a", "example"]
The generators shown in Table 5are designed to work seamlessly with the
test().each() pattern introduced in Listing 1.12. Listing 1.14 exemplifies a
randomized test case for our continuous example. It works with random strings
but generates test data to check intentionally for cases where the to be counted
character is placed in front, in the middle or at the end of a string to cover
frequent programming errors like off-by-one errors.
Listing 1.14. Example for a randomized test case expressed in JEdUnit.
@Test(weight=0.5 , description="Randomized tests ")
public void test_03_randomized_cases() {
// Regexp to generate random strings
String r = "[a-zA-Z]{5 ,17}";
// Pick a random character to search
char c = c();
test(
t(c, s(c + "{1 ,7}" ,r,r)),// First position
t(C, s(r, r, c + "{1,7}")), // Last position
t(c, s(r, c + "{1 ,7}" ,r)),// Middle position
).each(check , explain , onError);
// check , explain , onError defined as in Listing 1.12
}
Such kinds of test cases are not prone to overfitting because the test data is
randomly generated for every evaluation. A possible generated feedback for the
test case shown in Listing 1.14 would look like so:
18 N. Kratzke
–[OK] countChars(‘j’, "jUcEzCzODWWN") should return 1
–[FAILED] countChars(‘j’, "zOdAqavJJkxjvrjj") should return 5
but returned 3
–[OK] countChars(‘j’, "SPAqlORwxjjjjRHIKCCWS") should
return 4
Each evaluation will be run with random data but according to comparable
patterns. So, JEdUnit provides both: comparable evaluations and random test
cases that are not prone to overfitting.
5.2 How Does JEdUnit Support Code Inspection?
JEdUnit integrates the JavaParser [18] library to parse Java source code into
an abstract syntax tree. JEdUnit tests have full access to the JavaParser library
and can do arbitrary checks with this parser. However, this can become quickly
very complex. Therefore, JEdUnit tests can make use of a selector-based model
that selects nodes from an abstract syntax tree (AST) of a compilation unit
to detect and annotate such kind of violations in a practical way. The approach
works similarly like CSS selectors selecting parts of a DOM-tree in a web context.
The following examples demonstrate how to use this selector model on sub-
mission ASTs pragmatically for code inspections. The reader should inspect the
two evasion examples shown in Listing 1.14 and 1.6 that show typical evasion
patterns to avoid recursive or lambda/stream-based programming in Java.
To detect lambda evasion (see Listing 1.6), we can add the following inspec-
tion to a test case. It scans for lambda functions that make use of block state-
ments in their definition. To use blocks in lambda functions might indicate a
kind of “problem evasion” in a submission – a student may try to evade from
lambda programming into more simple statement based programming (which is
likely not the intent of the assignment).
Listing 1.15. Example for a lambda evasion check in JEdUnit.
@Inspection(description=" Lambda evasion inspection")
public void assignmentInspections() {
penalize(25, "No blocks in lambdas.",()->
inspect("Main. java", ast -> ast.select(LAMBDA)
.select(BLOCK)
.annotate("No blocks in lambdas.")
.exists ()
)
);
}
So, this inspection would effectively detect submissions like the one already
presented in Listing 1.6. To detect recursion evasion (see Listing 1.4) we simply
have to search for methods outside the main() method that make use of loop
statements.
How Programming Students Trick and What JEdUnit Can Do Against It 19
Table 6. Predefined code inspections provided by JEdUnit.
Check option Default Description
CHECK IMPORTS True Checks if only cleared libraries are
imported. java.util.* is allowed
per default
CHECK COLLECTION INTERFACES Tr u e Checks if collections are accessed
via their interfaces only (e.g. List
instead of LinkedList is used in
method parameters, method
return types, variable and
datafield declarators). To use
concrete collection classes like
ArrayList instead of their
interface List is penalized by
default, the same applies for Map.
This check can be deactivated
ALLOW LOOPS True for,while,do while,forEach()
loops are allowed per default. This
can be deactivated and penalized.
E.g., in case if methods must be
implemented recursively
ALLOW METHODS True Methods are allowed per default.
Can be deactivated and penalized
in special cases, like to enforce the
usage of lambda functions
ALLOW LAMBDAS True Lambdas are allowed per default.
Can be deactivated and penalized
in special cases, like to enforce the
usage of methods
ALLOW INNER CLASSES Fals e Inner classes are penalized by
default. This check can be
deactivated if inner classes shall
be allowed
ALLOW DATAFIELDS False Checks if datafields (global
variables) are used. This is
penalized by default. However,
this check must be deactivated for
object-oriented contexts
ALLOW CONSOLE OUTPUT Fals e By default System.out.print()
statements are not allowed and
penalized outside the scope of the
main() method. This check can be
deactivated
20 N. Kratzke
Listing 1.16. Example to detect recursion evasion.
inspect(cu, ast -> ast.select (METHOD , "[name!=main]")
.select(FOR, FOREACH , WHILE , DOWHILE)
.annotate("no loops allowed")
.exists ()
);
The same technique can be used to detect submissions that make use of
System.out.println() or System.exit() calls that might indicate injection
attacks.
Listing 1.17. Example to detect suspect method calls (used JEdUnit internally).
inspect(cu, ast -> ast.select(METHODCALL )
.filter(c -> c.toString().startsWith("System.exit")))
.annotate(c -> "[ CHEAT] Forbidden call : " +c)
.exists ()
);
The reader may notice that these selector-based code inspections are quite
powerful and flexible to formalize and detect arbitrary violation patterns in
student code submissions. JEdUnit makes intensively use of this code inspection
technique and provides several predefined code inspections that can be activated
by config options. Table 6lists some of these predefined inspections.
5.3 How Does JEdUnit Supports Separation of Evaluation
and Submission Logic?
The main problem of injection attacks is that the submission logic and the eval-
uation logic make use of the same console streams (stdout/stderr). The problem
is that the grading component interprets this console output and this output
could be compromised by student submission’s outputs (intentionally or unin-
tentionally). JEdUnit solves this problem simply be redirecting the stdout/stderr
console streams to other streams. Simple methods like this do this redirection.
Listing 1.18. Redirection of stdout console stream.
/**
* Creates a file called console.log that stores all
* console output generated by the submitted logic.
* Used to isolate the evaluation output from the
* submitted logic output to prevent injection attacks .
* @since 0.2.1
*/
public void redirectStdOut() throws Exception {
this.redirected = new PrintStream(
Config.STD_OUT_REDIRECTION
);
System.setOut(this.redirected);
}
How Programming Students Trick and What JEdUnit Can Do Against It 21
The stdout/stderr streams are switched whenever a submission logic is called
and switched again when the submission logic returns. Because the submission
logic has no access to the evaluation library logic of JEdUnit, it can not figure
out the current state of the redirection and is therefore not capable of reversing
this redirection. In consequence, JEdUnit generates two stdout/stderr streams.
One for the evaluation logic that is passed to the VPL grading component,
and one for the submission logic that could be presented as feedback to the
user for debugging purposes. However, currently, JEdUnit is simply ignoring the
submission output streams. Figure 4shows how JEdUnit changes the control
flow (right-hand side) in comparison to the standard VPL approach (left-hand
side).
This stream redirection effectively separates the submission logic streams
from the evaluation logic streams, and no stream injections can occur.
Fig. 4. Isolation of console streams in JEdUnit to prevent injection.
Additionally, JEdUnit prevents that student submissions can stop their exe-
cution via System.exit() or similar calls to bypass the control flow of the
evaluation logic. This is done by prohibiting the reflection API2and calls to
System.exit() by code inspection means already explained in the previous
Sect. 5.2.
6 Discussion of Results and Threats of Validity
If we compare the number of detected cheats in the systematization phase with
the validation phase (see Table 3in Sect. 3), we see the impact of applying
solutions like JEdUnit. In the systematization phase cheat detection has been
2The reflection API would enable to formulate arbitrary calling indirections that
could be not identified by code inspections.
22 N. Kratzke
done manually and without notice to the students. In the validation phase cheat
detection has been done automatically by JEdUnit and been reported immedi-
ately as JEdUnit evaluation feedback to the students. The consequence was that
cheating occurred only in the first weeks. What is more, the cheating detected
in the validation phase has been only observed in intermediate outcomes and
not in the final submissions (while the cheating reported in the systematization
phase is cheating that made it to the final submissions).
So, we can conclude that JEdUnit effectively detects the four identified pat-
terns of cheating (overfitting, evasion, redirection, and injection). Cheating was
not even applied in intermediate results and came entirely to an end in the
second half of the semester. It was no effective mean anymore from a students
perspective.
However, we should consider and discuss the following threats of internal and
external validity [4] that apply to our study, and that might limit its conclusions.
Selection Bias and Maturation. We should consider that the target audience
of the systematization and validation phase differed a bit. In the systematiza-
tion phase, we worked with first-semester novice programmers of a computer
science and information technology and design study programme. In the vali-
dation phase, we worked with second-semester computer science students that
made some programming progress during their first semester. This increase in
expertise might be one reason why a general decrease in cheating was observable
in the second semester. However, that explains not the fact that cheating comes
entirely to an end in the second half of the semester. This decrease might be
only explained by the fact that students learned that the grading automat “was
not so dumb” anymore.
Contextual Factors. This threat occurs due to specific conditions under which
research is conducted that might limit its generalisability. In this case, we were
bound to a Moodle-based APAAS solution. The study would not have been
possible outside this technical scope. We decided to work with VPL because it is
the only mature-enough open source solution for Moodle. Therefore, the study
should not be taken to conclude on all existing APAAS systems. Especially
JEdUnit is a Moodle/VPL specific outcome. However, it seems to be worth to
check existing APAAS solutions whether they are aware of the four identified
cheat-patterns (or attack vectors from a system security perspective) and what
can be transferred from JEdUnit into further APAAS solutions.
Hawthorne Effects. This threat occurs due to participants’ reactions to being
studied or tested. It alters their behaviour and therefore, the study results. We
can observe the Hawthorne effect quite obvious in Table 3. The reader should
compare the systematization phase (unaware of observation) and the validation
phase (noticing a more clever grading automat capable of detecting cheats). In
the validation phase cheating was drastically reduced, only occurred in inter-
mediate outcomes, and even comes entirely to an end in the second half of the
How Programming Students Trick and What JEdUnit Can Do Against It 23
semester. JEdUnit made the grading component more“clever” and changed the
behaviour of students not to cheat.
Because of this internal feedback loop, the study should not be taken to draw
any conclusions on the quantitative aspects of cheating. Furthermore, the reader
should additionally take all the threat to validity discussion of the systematiza-
tion phase [11] into account in order to avoid drawing wrong conclusions.
7 Conclusion
Students trick – at least 15% of them. Maybe because grading components of
automated programming assessment systems can be tricked very easily. Even
first-year students are clever enough to do this. We identified recurring pat-
terns like overfitting,evasion,redirection, and even injection tricks. Most
APAAS solutions provide sandbox mechanisms and code plagiarism detection to
protect the host against hostile code or to detect “copy-paste” cheating. How-
ever, these measures do not prevent submissions like
System.out.println("Grade :=>> 100");
System.exit(0);
which would give full points (in a Moodle/VPL setting) regardless of the assign-
ment complexity. These two lines (or the injection idea behind) are like an
“atomic bomb” for a grading component that is unaware to many program-
ming instructors. Current APAAS solutions can little do against it. This study
aimed to systematize such kind of recurring programming student cheat patterns
and to search for mitigation options. To handle these kinds of cheats we do not
only need sandboxing nor code plagiarism detection (that almost all APAAS
solutions provide) but additionally means
1. to randomise test cases (contributes to 63% of all cheats, see Table 4),
2. to provide pragmatic code inspection techniques (contributes to 36% of
all cheats, see Table 4),
3. and to isolate the submission and the evaluation logic consequently in
separate processes (contributes to 1% of all cheats, see Table 4).
Therefore, this paper presented and evaluated the solution proposal JEdUnit
that handles these problems. According to our validation in three courses with
over 260 students and more than 3.600 submissions of programming assignments,
JEdUnit can make grading components much more “intelligent”. We showed
how to make overfitting inefficient, how to detect evasion and redirection, and
how to deny injection cheat patterns. When students learn that the grading
component is no dumb automat anymore, cheating decreases immediately. In
our case, cheating comes even entirely to an end.
However, as far as the author oversees the APAAS landscape, exactly these
mentioned features are only incompletely provided by current APAAS solutions.
Most APAAS solutions focus on sandboxing and code plagiarism detection but
24 N. Kratzke
oversee the cheating cleverness of students. JEdUnit is not perfect as well, but it
focuses on the cheating cleverness of students and is meant as fortifying add-on
to existing APAAS solutions like Moodle/VPL. However, its biggest shortcoming
might be its current technological dependence on Moodle/VPL and Java.
Further research should focus on aspects of how the JEdUnit lessons learned
can be transferred into further APAAS solutions. To do this, we have to ask how
and which of the JEdUnit principles can be made programming language and
APAAS solution agnostic? Therefore, JEdUnits working state can be inspected
on GitHub3to foster the adaption of the ideas and concepts behind JEdUnit.
References
1. Ala-Mutka, K.M.: A survey of automated assessment approaches for programming
assignments. Comput. Sci. Educ. 15(2), 83–102 (2005). https://doi.org/10.1080/
08993400500150747
2. Alraimi, K.M., Zo, H., Ciganek, A.P.: Understanding the MOOCs continuance:
the role of openness and reputation. Comput. Educ. 80, 28–38 (2015). https://
doi.org/10.1016/j.compedu.2014.08.006,http://www.sciencedirect.com/science/
article/pii/S0360131514001791
3. Caiza, J.C., Alamo Ramiro, J.M.d.: Automatic grading: review of tools and imple-
mentations. In: Proceedings of 7th International Technology, Education and Devel-
opment Conference (INTED2013) (2013)
4. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-experimental Designs for
Research. Houghton Mifflin Company, Boston (2003). Reprint
5. Douce, C., Livingstone, D., Orwell, J.: Automatic test-based assessment of pro-
gramming: a review. J. Educ. Resour. Comput. 5(3) (2005). https://doi.org/10.
1145/1163405.1163409,http://doi.acm.org/10.1145/1163405.1163409
6. Gupta, S., Gupta, B.B.: Cross-site scripting (xss) attacks and defense mechanisms:
classification and state-of-the-art. Int. J. Syst. Assurance Eng. Manage. 8(1), 512–
530 (2017). https://doi.org/10.1007/s13198-015-0376- 0
7. Halfond, W.G.J., Orso, A.: Amnesia: analysis and monitoring for neutralizing
SQL-injection attacks. In: Proceedings of the 20th IEEE/ACM International
Conference on Automated Software Engineering, ASE 2005, pp. 174–183. ACM,
New York (2005). https://doi.org/10.1145/1101908.1101935,http://doi.acm.org/
10.1145/1101908.1101935
8. Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3),
90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
9. Ihantola, P., Ahoniemi, T., Karavirta, V., Sepp¨al¨a, O.: Review of recent systems
for automatic assessment of programming assignments. In: Proceedings of the 10th
Koli Calling International Conference on Computing Education Research, Koli
Calling 2010, pp. 86–93. ACM, New York (2010). https://doi.org/10.1145/1930464.
1930480,http://doi.acm.org/10.1145/1930464.1930480
10. Kluyver, T., et al.: Jupyter notebooks - a publishing format for reproducible com-
putational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and Power in
Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press (2016)
11. McLaren, B.M., Reilly, R., Zvacek, S., Uhomoibhi, J. (eds.): CSEDU 2018. CCIS,
vol. 1022. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21151-6
3https://github.com/nkratzke/JEdUnit.
How Programming Students Trick and What JEdUnit Can Do Against It 25
12. Oliphant, T.: A Guide to NumPy. Trelgol Publishing (2006)
13. del Pino, J.C.R., Rubio-Royo, E., Hern´andez-Figueroa, Z.J.: A virtual program-
ming lab for moodle with automatic assessment and anti-plagiarism features. In:
Proceedings of the 2012 International Conference on e-Learning, e-Business, Enter-
prise Information Systems, and e-Government (2012)
14. Pomerol, J.C., Epelboin, Y., Thoury, C.: What is a MOOC?, Chapter 1, pp. 1–
17. Wiley-Blackwell (2015). https://doi.org/10.1002/9781119081364.ch1,https://
onlinelibrary.wiley.com/doi/abs/10.1002/9781119081364.ch1
15. Ray, D., Ligatti, J.: Defining code-injection attacks. SIGPLAN Not. 47(1),
179–190 (2012). https://doi.org/10.1145/2103621.2103678,http://doi.acm.org/10.
1145/2103621.2103678
16. Rodr´ıguez, J., Rubio-Royo, E., Hern´andez, Z.: Fighting plagiarism: metrics and
methods to measure and find similarities among source code of computer pro-
grams in VPL. In: EDULEARN11 Proceedings, 3rd International Conference on
Education and New Learning Technologies, IATED, pp. 4339–4346, 4–6 July 2011
17. Romli, R., Mahzan, N., Mahmod, M., Omar, M.: Test data generation approaches
for structural testing and automatic programming assessment: a systematic liter-
ature review. Adv. Sci. Let. 23(5), 3984–3989 (2017). https://doi.org/10.1166/asl.
2017.8294
18. Smith, N., van Bruggen, D., Tomassetti, F.: JavaParser: Visited. Leanpub (2018)
19. Staubitz, T., Klement, H., Renz, J., Teusner, R., Meinel, C.: Towards practical
programming exercises and automated assessment in massive open online courses.
In: 2015 IEEE International Conference on Teaching, Assessment, and Learning for
Engineering (TALE), pp. 23–30, December 2015. https://doi.org/10.1109/TALE.
2015.7386010
20. Su, Z., Wassermann, G.: The essence of command injection attacks in web appli-
cations. SIGPLAN Not. 41(1), 372–382 (2006). https://doi.org/10.1145/1111320.
1111070,http://doi.acm.org/10.1145/1111320.1111070
21. Thi´ebaut, D.: Automatic evaluation of computer programs using moodle’s virtual
programming lab (vpl) plug-in. J. Comput. Sci. Coll. 30(6), 145–151 (Jun 2015),
http://dl.acm.org/citation.cfm?id=2753024.2753053
22. Thompson, K.: Programming techniques: regular expression search algorithm.
Commun. ACM 11(6), 419–422 (1968)