Conference PaperPDF Available

Common logic errors made by novice programmers



Errors in the logic of a program (sometimes referred to as semantic errors) can be very frustrating for novice programmers to locate and resolve. Developing a better understanding of the kinds of logic error that are most common and problematic for students, and finding strategies for targeting them, may help to inform teaching practice and reduce student frustration. In this paper we analyse 15,000 code fragments, created by novice programming students, that contain logic errors, and we classify the errors into algorithmic errors, misinterpretations of the problem, and fundamental misconceptions. We find that misconceptions are the most frequent source of logic errors, and lead to the most difficult errors for students to resolve. We list the most common errors of this type as a starting point for designing specific teaching interventions to address them.
Common Logic Errors Made By Novice Programmers
Andrew Ettles
University of Auckland
Auckland, New Zealand
Andrew Luxton-Reilly
University of Auckland
Auckland, New Zealand
Paul Denny
University of Auckland
Auckland, New Zealand
Errors in the logic of a program (sometimes referred to as semantic
errors) can be very frustrating for novice programmers to locate
and resolve. Developing a better understanding of the kinds of
logic error that are most common and problematic for students, and
nding strategies for targeting them, may help to inform teaching
practice and reduce student frustration.
In this paper we analyse 15,000 code fragments, created by novice
programming students, that contain logic errors, and we classify the
errors into algorithmic errors, misinterpretations of the problem,
and fundamental misconceptions. We nd that misconceptions
are the most frequent source of logic errors, and lead to the most
dicult errors for students to resolve. We list the most common
errors of this type as a starting point for designing specic teaching
interventions to address them.
Social and professional topics Computing education;
CS1, logic errors, novice programmers
ACM Reference Format:
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common
Logic Errors Made By Novice Programmers. In ACE 2018: 20th Australasian
Computing Education Conference, January 30-February 2, 2018, Brisbane,
QLD, Australia. ACM, New York, NY, USA, 7 pages.
Empirical research on identifying and classifying student errors has
focused primarily on syntax errors that are detected at compile time,
perhaps because static analysis can easily be applied to large data
sets [
]. In particular, recent work by Becker [
], Denny [
Pettit [
], and Prather [
] examined common syntax errors and
evaluated strategies, such as improving the messages generated by
the compiler, for helping students to resolve them.
Logic errors, sometimes referred to as semantic errors, are sit-
uations where the programmer’s code compiles successfully and
executes but fails to generate the expected output for all possible
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from
ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia
2018 Copyright held by the owner/author(s). Publication rights licensed to Associa-
tion for Computing Machinery.
ACM ISBN 978-1-4503-6340-2/18/01.. .$15.00
inputs. Ahmadzadeh et al
. [1]
state that such errors are generated
when the intended meaning of the programmer’s code is not con-
sistent with the language. Debugging logic errors can be a very
frustrating activity for programmers, as often there is no readily
available feedback on the location or nature of the error. Novice pro-
grammers may nd it particularly dicult to detect logic errors due
to their relative lack of debugging experience combined with possi-
ble misconceptions in their programming knowledge [9, 12, 16].
In this paper we investigate the following two research questions:
RQ1 What are the most common logic errors students make?
Which logic errors are most problematic for students to
identify and x?
To identify student errors we analysed data taken from a large
rst-year university programming course. Our goal in presenting
the most common errors is to highlight some of the challenges that
students encounter when learning to program which may help to
inform future teaching practice.
Hristova et al. [
] found improper casting, not storing the result
from a call to a non-void method, and ow reaching the end of
a non-void method were among the most commonly made logic
errors in Java. This data was collected from a survey of college
professors and teaching assistants within Computer Science, rather
than through empirical analysis of code. McCall and Kölling [
categorised errors in student code through manual analysis instead
of relying on compiler messages. Although they did not provide a
detailed breakdown of the most common logic errors, numerous
syntax errors were categorised.
Altadmri and Brown [
] provided a comprehensive empirical
analysis of the frequency and time-to-x of dierent errors in Java,
including semantic errors for over 250,000 novice programming
students. They found that the average time students spent xing
semantic errors decreased over the course of a semester and that
semantic errors are more challenging for students to deal with
than syntax errors. Similar results were reported in the work of
Fitzgerald et al. [
] in which students were provided with fragments
of bug-ridden code which they were required to correct. In this
work, logic errors were among the hardest for students to nd, but
once found they were generally easy to x.
The work we report here is analogous in design to that of Denny
et al. [
] which identied the most frequent and problematic syntax
errors encountered by novice programmers. In this work, we anal-
yse data taken from the automated assessment tool, CodeWrite [
which presents students with short programming problems in the
form of individual functions. For each function, a description and
signature are given, and students are required to provide a working
implementation. Our study is novel in that it focuses specically on
ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia Andrew Eles, Andrew Luxton-Reilly, and Paul Denny
targeting and classifying logic errors. We also develop a framework
that allows us to aggregate submissions based upon the kind of
logic error made.
3.1 Context
The data we analyse was collected during a rst-year programming
course taught at the University of Auckland in 2016. This course is
compulsory for all Engineering students and provides an introduc-
tion to engineering computation and programming. The rst half
of the course uses the MATLAB language and environment, and
the second half of the course is an introduction to programming
in C.
In the second half of the semester, during the C programming
module, students were given a compulsory assignment involving
10 short programming problems designed by the course instructor.
The questions were ordered in increasing complexity, and were
to be completed online using the web-based CodeWrite tool [
CodeWrite presents each problem in the form of a short textual
description, an accompanying function header (which shows the
input and output types for the function) and an example test case
(showing one set of input values and the corresponding expected
output). With this information, the task for the student is to cre-
ate a working implementation of the function. An example of the
presentation of the questions is provided in Figure 1.
Dene a function Product() which takes two integer inputs
. The function should return the product of these two values.
For example, the function call Product(10, 20) should return the
value 200. The function header is as shown:
int Product(int a, int b)
Figure 1: Example illustrating how the questions were pre-
sented to students
3.2 Questions
The programming exercises place emphasis on testing student
knowledge of variables, control structures (if-else statements, for
loops) and arrays. The textual description for each of the questions
is as follows:
Dene a function called Product() which takes two integer
inputs: a and b. The function should return the product of
these two values (i.e. a * b)
Dene a function called Largest() which takes three integer
inputs: a, b and c. The function should return the largest of
these three values.
Dene a function called SumIsEven() which takes two integer
inputs and returns true (i.e. the integer value 1) if and only
if the sum of the two values is an even number. If the sum of
the input values is an odd number, then the function must
return false (i.e. 0).
Dene a function called SumBetween() which takes two
integer inputs, low and high. This function should calculate
the sum of all integers between low and high inclusive.
It’s apple season, and the apples have to be packed! Write a
function called BoxesNeeded() which takes one input. The
input value represents a number of apples that need to be
packed into boxes. Each box can store a maximum of 20
apples. Your function should return the minimum number
of boxes needed to store the given number of apples.
A bitter rivalry has developed between two lecture streams
(stream A and stream B) regarding which is smarter. The per-
formance of each stream can be measured as the percentage
of students who answer a question correctly in class. Dene
a function called SmarterStream() that takes four inputs. The
rst input is the number of students in stream A that an-
swer the question correctly, and the second input is the total
number of students in stream A. Similarly, the third input
is the number of students in stream B that answer the ques-
tion correctly, and the fourth input is the total number of
students in stream B. The SmarterStream() function should
return true (i.e. the value 1) if stream A performs better than
stream B and false (i.e. the value 0) otherwise.
You like to eat pizzas for dinner. A good dinner is a dinner
where you eat between 10 and 20 pizzas (inclusive). Unless
it is a weekend, in which case there is no upper bound on
the number of pizzas. Write a function called GoodDinner()
which takes two inputs: the number of pizzas you eat, and a
Boolean which is true if it is the weekend and false otherwise.
Your function should return true (i.e. the value 1) if you had
a good dinner, and false (i.e. the value 0) if you didn’t.
Write a function called CountOnes() which is passed two
inputs: an array of integers and the number of elements in
the array. This function should count how many elements
in the array are equal to 1, and it should return this number.
Write a function called LastIndexOf() which is passed three
inputs: a value to search for, an array of integers in which to
search and the number of elements in the array. This func-
tion should return the index position of the last occurrence
(i.e. the rightmost occurrence) of the value being searched
for. If the value being searched for does not exist in the array,
then the function should return -1.
Write a function called Range() which is passed two inputs:
an array of integers and the number of elements in the ar-
ray. This function should return the dierence between the
largest value in the array and the smallest value in the array.
3.3 Data Collection
Each time a student attempts a question, the code they write is
compiled and, if successful, executed against a set of functional
tests. If a compile-time error occurs, or code execution terminates
abnormally, then information about the error is displayed to the
student. If the submission runs without errors then the student is
shown a list of test cases (input values), the expected output for
each, the actual output for each, and whether the test was passed or
failed. If all test cases are passed, the student has solved the question
and they can progress to the next problem (although this ordering
Common Logic Errors Made By Novice Programmers ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia
1 2 3 4 5 6 7 8 9 10
Submissions with logic errors
0 500 1000 1500 2000 2500
Figure 2: Number of total logic errors made for each assign-
ment question
is not enforced, as students can attempt the questions out of order
if they wish). One or more hidden test cases are also present in each
question so that students cannot reverse engineer solutions from
the feedback provided. There is no limit on the number of attempts
that a student may make to solve a question.
Students had approximately one week to complete all 10 ques-
tions. Every submission logs the user id, timestamp, the actual code
submitted, and a coded indication of the result; in the case of a
syntax error, the compiler error message is also logged. The set
of coded result types is as follows: ALL_PASS, NOT_ALL_PASS,
For the purposes of this analysis we only look at submissions that
are coded as NOT_ALL_PASS. These correspond to submissions
that compile, and run without error, but that do not pass all the test
A total of 809 distinct students participated and made at least
one submission. Overall there were 51,981 submissions over the
10 questions. Of these, 15,168 (29%) were failed submissions gen-
erating a NOT_ALL_PASS code. These correspond to logic errors
for the purpose of our analysis. Figure 2 shows the distribution of
submissions containing logic errors for each of the 10 questions.
There is a general (although weak) increasing trend of frequency of
logic errors with question number, as would be expected, reecting
the increasing complexity of the questions.
3.4 Grouping Logic Errors
One challenge in classifying logic errors is that the erroneous code
generates an incorrect output (for at least one test case) but other-
wise provides no indication of the location or cause of the error. In
comparison, categorising syntax errors is simpler due to the fact
the compiler generates a text-based error message (such as: Type
mismatch: cannot convert from double to int) which can be used as a
basis for classication.
In this work, we identify the most
logic errors for
each question by analysing the output of all submissions to the
question across the set of test cases. Given the ordered set of test
cases <T1,T2.....Tn>for a question, we use the value tto denote
that the code passes the corresponding test, otherwise we use the
output value of the failed test to denote failure. As an example
consider the following common student submission containing a
logic error for Question 2:
int Largest(int a, int b, int c) {
if ((a>b) && (a>c)){
return a;
else if ((b>a) && (b>c)){
return b;
else {
return c;
Given four test cases
the cor-
responding output vector will be:
We make the assumption that submissions with identical output
vectors contain the same logic error. An advantage of this approach
is that it provides an automated means of categorising errors; how-
ever, a disadvantage is that the developer of the test cases needs to
anticipate potential logic errors ahead of time. Ideally, the set of
test cases used for classication should provide good coverage of
the possible errors.
For example, in the question shown previously, tests would ide-
ally include negative numbers, a representative from each permu-
tation of numbers, all numbers equal, and so on. In this ‘black-box’
approach to testing, poor coverage would result in dierent logic
errors generating the same output vector. Benac Earle et al
. [6]
adopt a similar methodology to rank student-submitted programs
according to the number of bugs present.
This approach is not perfect, however, and there are situations
in which it is not possible to distinguish between dierent kinds
of logic error. This commonly occurs with functions that output
Boolean values, where the total number of distinct test vectors is
limited. For example, consider the two submissions to Question 3
shown below:
int SumIsEven(int a, int b) {
return (a+a)%2 == 1;
int SumIsEven(int a, int b) {
if((a+b)%2 == 2){
return 0;
return 1;
ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia Andrew Eles, Andrew Luxton-Reilly, and Paul Denny
Both submissions shown above produce the same output for any
set of inputs, even though the logic errors are dierent. In the rst
case, the student has mistakenly written
instead of
. In
the second case, the student appears unaware that
can only
evaluate to 0 or 1. These issues only impact our analysis where two
or more frequent logic errors give the same output vector.
One further issue to acknowledge is that identical submitted
code may give dierent output vectors on the same set of test
cases. For instance, accessing an element in an uninitialised por-
tion of an array will result in an unpredictable ‘garbage value’.
Such values will likely dier each time the code is run, resulting
in dierent test case outputs. To help address this, post-processing
can be used to identify submissions that generate unusual incor-
rect output. For example, output vectors
may be the result of distinct logic errors. However, output vectors
are likely caused by the same error due to accessing an invalid
memory location. This post-processing was specically performed
for Question 10 where it was common for an o-by-one error to
generate an invalid array index and result in an abnormally large
return value.
3.5 Classifying Logic Errors
The output vectors alone give no indication as to the nature of the
corresponding logic errors. To classify the errors, we performed a
manual examination of the submitted code. Individual questions
generated a unique range of logical errors. Figure 3 plots the fre-
quency of the dierent types of error observed across Question 6.
The distribution is highly skewed as a few types of error were very
common and many dierent errors occurred rarely. This distribu-
tion is consistent with similar studies examining the frequency of
syntax errors in novice code [
]. We focus our manual examination
on the most common errors, dened to be errors that occur at least
5% of the time for a given question or which have a total frequency
greater than 50. This threshold of 5% was chosen arbitrarily. Reduc-
ing the threshold further would capture more of the errors that are
rarely seen in the data.
We classify the logic errors into three broad categories:
(1) Algorithmic
— the algorithm a student is trying to use to
solve a problem is fundamentally awed. This is an error in
abstract terms (before the students have even started coding)
and does not necessarily reect a fundamental lack of pro-
gramming knowledge. For example, in Question 2 students
failed to account for the case where items a and b are equal
and c is less than both a and b.
(2) Misinterpretation
— the student makes an unintentional
mistake as a result of misinterpreting the question. For ex-
ample, in Question 9 students were instructed to return -1 if
the specied item did not occur in the array. Many students
returned 0 instead. Again, this error is unlikely to be due to
a lack of programming knowledge.
(3) Misconception
— a logical error that reects a fundamental
aw in programming knowledge. For example, not realising
that array indexing starts at 0 (in C), or not realising that an
uninitialised variable does not have a default value.
1 4 7 11 15 19 23 27 31 35 39 43 47 51
Logic error
0 500 1000 1500 2000
Figure 3: Frequency of the 52 dierent classied logic errors
for Question 6
4.1 RQ1: Most Common Logic Errors
To answer the rst research question (i.e., What are the most com-
mon logic errors students make?), we analyse each of the ten ques-
tions in turn. For each question, we classify and list the most com-
mon logic errors and we include the median time taken for students
to resolve them. In practice we classied any output vector that
occurred at least 5% of the time for a particular question.
To be concise, we list only the most common errors below —
those that occurred at least 10% of the time for a given question —
listed by question number.
Fewer than 10% of the submissions contained errors. The
question was designed to familiarise students with the
environment and was not expected to be challenging.
31.2% (1 min): Failure to consider case where a and b are
equal and c is less than a and b (algorithmic).
17.4% (2 min): Failure to consider case where all three
inputs are equal (algorithmic).
12.7% (2 min): Returning the opposite value. Taking 0 to
be even instead of 1 (misinterpretation).
13.2% (2 min): For loop is not inclusive so sum excludes
the last element (misconception).
12.5% (2 min): Not initialising an integer variable (miscon-
18.7% (8 min): O-by-one error, outputting one less box
than required (algorithmic).
12.3% (8 min): Boundary error; one more box than neces-
sary for multiples of 20 (algorithmic).
72.2% (6 min): Not realising that division of two integers
returns the oor of the division, not a double. Also failure
to correctly cast integers into doubles (misconception).
10.9% (12 min): Dierent variation of the above error (mis-
Common Logic Errors Made By Novice Programmers ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia
1 2 3 4 5 6 7 8 9 10
Submissions with printf()
0 20 40 60 80
Figure 4: Number of submissions containing printf state-
ments for each question.
18.2% (7 min): Not considering the case where it is the
weekend and there are more than 20 pizzas (algorithmic).
11.7% (5 min): Using if-if-else ow structure instead of
if-else if-else (misconception).
34.5% (6 min): Accessing an element outside an array. Usu-
ally a result of using instead of <(misconception).
22.3% (5 min): Returning 0 instead of -1 if an item is not
found (misinterpretation).
12.8% (17 min): Always returning -1; a variety of dierent
logic errors caused this.
11.6% (10 min): Only returning the value at the last index
32.8%, 20 min - Accessing an element outside an array.
Usually a result of using instead of <(misconception).
Statements. A special case of logic error was the
use of
statements (the conventional way to print to stan-
dard output in C) within student submissions. The automatic tester
considered all output from the student program when evaluating
results. Because of this, any submission with a
would likely fail all tests. We identied all submissions containing
printf statements, and removed them from the above analysis.
Overall, approximately 5% of submissions had a logic error due
to the presence of a
statement. Figure 4 shows the distribu-
tion of
statements across the 10 questions, which exhibits
the opposite trend to the distribution of errors shown in Figure 2.
Question 1 had the fewest total logic errors but the highest number
statements. Fully 82% of all logic errors for Question 1
were due to printf statements.
One explanation for this trend is that students attempt problems
sequentially. Those that use
statements early in the sequence
will learn from this, and not repeat it in later questions. It is also
possible that students initially misunderstood the conceptual dier-
ence between printing output and returning a value, or they may
simply have paid little attention to the specic programming sense
of the word return (and the syntax in the header given), and focused
on solving the problem described. It also possible that students
statements as a method of debugging their code when
presented with other logic errors. Unfortunately, we are unable to
determine why the
were used, merely that they were present
in the code.
4.2 RQ2: Measuring Error Diculty
To investigate the second research question (i.e., Which logic errors
are most problematic for students to identify and x?), we consid-
ered two dierent metrics for how problematic a logic error was
for students.
Error Correction Time
— the time elapsed between the rst
submission with a logic error and when a successful submission is
made. This is similar to the approach taken by Denny et al. [
] in an
analogous study of syntax errors. However we do not account for
the fact that solving one logic error may uncover other errors that
also need solving. We also cannot assume that students will work
on a submission continuously until it is solved. A student may step
away from the task and come back to it at a later time. To account
for this, we take the median over all times.
Repeat Errors
— we also look at how many times students
made multiple submissions with the same error. We can express
this as the ratio of unique students that made a specic error to the
number of submissions with that error. Students that make multiple
submissions with the same error are probably struggling to identify
the location of the error, or if they think they have located it, to
develop a solution to x the error. A problem with this measure is
that a single student making a large number of incorrect repeated
submissions can heavily skew this statistic.
4.2.1 Comparing Dierent Logic Errors. As expected, algorith-
mic errors and misinterpretation errors were less frequent, were
resolved more quickly, and tended not to be repeated as frequently
compared with misconception errors. Since students can view the
test cases on which their submission failed, they receive feedback
that they can use to desk-check (i.e. perform a paper-based code
trace on) their code, to discover where the aw in their logic lies. The
CodeWrite tool does not provide the functionality to step through
the code, so students are forced to manually trace code execution.
With a misconception error, this process may not help students to
identify the problem – when tracing code the students have a false
assumption regarding the behaviour of some fragment of the code,
so they are less likely to detect the error.
In our data, algorithmic errors and misinterpretation errors are
relatively harmless and students tend to solve them fairly easily. On
the other hand, when students have a fundamental lack of under-
standing of common programming concepts, logic errors are more
problematic to resolve. In the next section we describe the common
misconceptions that we identied as being most problematic.
4.3 Common Misconceptions
Here we provide a list of some of the most common and dicult
misconception logic errors that we observed across the ten ques-
ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia Andrew Eles, Andrew Luxton-Reilly, and Paul Denny
(1) Integer Division
— many students failed to realise that
dividing two integers evaluates to an integer, truncating
any fractional component. Students also displayed a lack
of knowledge regarding casting integers to doubles. Some
examples of common erroneous statements:
double d = int1/int2;
double d = (double)(int1/int2);
(2) Uninitialised Variables
— many students assumed the
value of an uninitialised integer variable to be 0. In real-
ity, uninitialised variables have an indeterminate ‘garbage
value’. An example of this type of error:
int a;
(3) Indexing/Iterating Arrays
— some students would use
instead of
in the stopping condition of a for loop, when
iterating over an array:
int sum = 0;
int values[3] = {1,2,3};
for(int i=0; i <= 3; i++){
sum += values[i];
This o-by-one error results in accessing a memory element
outside the bounds of the array, either returning a garbage
value or leading to a program crash. A related error is caused
by students not realising that array indexing begins at 0,
resulting in iteration from index 1 to the index equal to the
size of the array.
One plausible reason that this type of error was prevalent
on the array-based questions in our data set was that stu-
dents had learned MATLAB in the rst half of the semester.
MATLAB indexes arrays starting at 1 rather than 0, and this
transition may have been dicult for some students.
A few less common misconceptions included using ‘=’ instead of
‘+=’ to add a value to another and using multiple if statements
instead of else-if statements within a control ow structure.
5.1 Limitations
Some logic errors will have been missed in our analysis. Up to 50%
of logic errors (the infrequent ones, falling in the tail of the distribu-
tion) for some questions were not classied. We suspect that many
of these infrequent errors are variations or combinations of the
common errors that were classied. However, we have missed some
rare types of error that could provide additional insight into student
thought processes. Similarly, we only looked at logic errors that
could occur on a set of pre-dened questions focusing on variables,
control structures and arrays. If we broadened the types of problem
that students were working on, we might uncover many other error
varieties. However, we note that the approach described here will
not work for non-deterministic program segments as the test cases
would not be comparable across the data set. Hidden test cases
could have been a confounding variable in our analysis. Students
that failed hidden test cases could not view the test case input to
trace through their code. This means that logic errors causing fail-
ure only on the hidden test cases will have been harder for students
to detect. This may have articially increased the diculty (and
probably the number of repeat submissions) for associated errors.
Our student population may not have been representative of
typical CS1 programming courses. Firstly, due to the high entry
requirements of the Engineering degree, students in our cohort will
likely have been academically stronger than students in a course
with more open entry requirements. Also, because the students in
our study had learned MATLAB in the rst half of the course, they
already had some familiarity with programming concepts when
beginning the C module of the course.
Interestingly, none of the most common misconceptions identi-
ed in this study are noted by Sirkiä and Sorva in their analysis of
student mistakes within the UUhistle environment [
]. We specu-
late that the use of more fully featured development environments
may impact on the types of misconception. It is also clear that al-
though Sirkiä and Sorva were investigating misconceptions in a
novice programming course, their context of study was very dier-
ent, involving students that were using a dierent programming
language, covering object-oriented programming, and including
extensive use of visual programming simulation software.
5.2 Conclusion
In this paper we have investigated the common and problematic
logic errors that novice programmers encounter in an introductory
C programming course. Our results reveal a similar pattern to
previous work on syntax errors in that a few kinds of logic errors
occur very frequently. The most dicult logic errors for students
to identify and solve can be classed as fundamental programming
misconceptions, which represent a lack of knowledge or a false
assumption about some coding construct. We hope that providing
empirical evidence to illustrate these common misconceptions may
assist instructors in clarifying these issues with their students.
5.3 Future Work
We plan to explicitly incorporate teaching activities based around
the misconceptions to explore how we might address potential
misunderstandings through specic, targeted exercises.
Future work should examine courses that teach languages with
greater built-in safety mechanisms, and explore how the feedback
provided by dierent compilation options impacts on student un-
derstanding. For example, in C# some of the common miscon-
ceptions we identied would not manifest as logic errors, partic-
ularly if strong compiler options were used and students were
encouraged to treat compiler warnings as errors. For example,
uninitialised variables will generate compile-time errors and ac-
cessing an element outside of an array’s bounds will throw an
Additional work could explore whether students learn from their
errors over time. For example, once a student resolves a logic error
of a particular type, are they less likely to make the same error
again? Providing students with a list of the common types of logic
Common Logic Errors Made By Novice Programmers ACE 2018, January 30-February 2, 2018, Brisbane, QLD, Australia
error may also be useful. Another approach might involve collect-
ing similar data again, using the same set of questions, but with-
out displaying the failed test cases to students. This may be more
representative of debugging in practice, where students would be
required to develop their own test suites.
Marzieh Ahmadzadeh, Dave Elliman, and Colin Higgins. 2005. An Analysis of
Patterns of Debugging Among Novice Computer Science Students. SIGCSE Bull.
37, 3 (June 2005), 84–88. DOI:
Amjad Altadmri and Neil C.C. Brown. 2015. 37 Million Compilations: Investigat-
ing Novice Programming Mistakes in Large-Scale Student Data. In Proceedings
of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE
’15). ACM, New York, NY, USA, 522–527.
Brett A. Becker. 2016. An Eective Approach to Enhancing Compiler Error
Messages. In Proceedings of the 47th ACM Technical Symposium on Computer
Science Education (SIGCSE ’16). ACM, New York, NY, USA, 126–131.
Brett A. Becker. 2016. A New Metric to Quantify Repeated Compiler Errors for
Novice Programmers. In Proceedings of the 2016 ACM Conference on Innovation
and Technology in Computer Science Education (ITiCSE ’16). ACM, New York, NY,
USA, 296–301. DOI:
Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle
Goslin, and Catherine Mooney. 2016. Eective compiler error message enhance-
ment for novice programming students. Computer Science Education 26, 2-3
(2016), 148–175. DOI:
Clara Benac Earle, Lars-Åke Fredlund, and John Hughes. 2016. Automatic Grading
of Programming Exercises Using Property-Based Testing. In Proceedings of the
2016 ACM Conference on Innovation and Technology in Computer Science Education
(ITiCSE ’16). ACM, New York, NY, USA, 47–52.
Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors
Are Not Equal. In Proceedings of the 17th ACM Annual Conference on Innovation
and Technology in Computer Science Education (ITiCSE ’12). ACM, New York, NY,
USA, 75–80. DOI:
Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011.
CodeWrite: Supporting Student-driven Practice of Java. In Proceedings of the 42nd
ACM Technical Symposium on Computer Science Education (SIGCSE ’11). ACM,
New York, NY, USA, 471–476. DOI:
Sue C Fitzgerald, Gary Lewandowski, Renée Mccauley, Laurie Murphy, Beth
Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: nding, xing and
ailing, a multi-institutional study of novice debuggers. Computer Science Educa-
tion 18 (June 2008), 93–116. DOI:
Maria Hristova, Ananya Misra, Megan Rutter, and Rebecca Mercuri. 2003.
Identifying and Correcting Java Programming Errors for Introductory Com-
puter Science Students. SIGCSE Bull. 35, 1 (Jan. 2003), 153–156.
Davin McCall and Michael Kölling. 2014. Meaningful categorisation of novice
programmer errors. In Frontiers in Education Conference (FIE 2014). IEEE.
Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan,
Yifat Ben-David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz
Wilusz. 2001. A Multi-national, Multi-institutional Study of Assessment of Pro-
gramming Skills of First-year CS Students. SIGCSE Bull. 33, 4 (Dec. 2001), 125–180.
Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Er-
ror Messages Help Students? Results Inconclusive. In Proceedings of the 2017 ACM
SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’17). ACM,
New York, NY, USA, 465–470. DOI:
James Prather, Raymond Pettit, Kayla Holcomb McMurry, Alani Peters, John
Homer, Nevan Simone, and Maxine Cohen. 2017. On Novices’ Interaction with
Compiler Error Messages: A Human Factors Approach. In Proceedings of the 2017
ACM Conference on International Computing Education Research (ICER ’17). ACM,
New York, NY, USA, 74–82. DOI:
Teemu Sirkiä and Juha Sorva. 2012. Exploring Programming Misconceptions:
An Analysis of Student Mistakes in Visual Program Simulation Exercises. In
Proceedings of the 12th Koli Calling International Conference on Computing Ed-
ucation Research (Koli Calling ’12). ACM, New York, NY, USA, 19–28.
James C. Spohrer and Elliot Soloway. 1986. Novice Mistakes: Are the Folk
Wisdoms Correct? Commun. ACM 29, 7 (July 1986), 624–632.
... Logic errors. Logic errors occur when the program does not solve the given problem [16]. For instance, if the written code instructs the compiler to execute command A under condition X while the command should instead be executed under condition Y to solve the given programming problem, this will result in a logical error. ...
... The monitoring of student programming behavior and mistakes has a long history in CS education research [4]. Numerous studies have attempted to monitor and identify the most common errors made by novice programmers [4,16]. In these prior studies, a variety of strategies have been followed to identify common programming errors. ...
... This error is experienced when novices declare variables rather than casting them [19]. According to [16], novices often display a lack of knowledge regarding casting integers to doubles. Since novices struggle to find logic errors, developing a better understanding of logic errors may help to inform teaching practices and reduce novices' frustration [16]. ...
Novices tend to make unnecessary errors when they write programming code. Many of these errors can be attributed to the novices’ fragile knowledge of basic programming concepts. Programming instructors also find it challenging to develop teaching and learning strategies that are aimed at addressing the specific programming challenges experienced by their students. This paper reports on a study aimed at (1) identifying the common programming errors made by a select group of novice programmers, and (2) analyzing how these common errors changed at different stages during an academic semester. This exploratory study employed a mixed-methods approach based on the Framework of Integrated Methodologies (FraIM). Manual, structured content analysis of 684 programming artefacts, created by 38 participants and collected over an entire semester, lead to the identification of 21 common programming errors. The identified errors were classified into four categories: syntax, semantic, logic, and type errors. The results indicate that semantic and type errors occurred most frequently. Although common error categories are likely to remain the same from one assignment to the next, the introduction of more complex programming concepts towards the end of the semester could lead to an unexpected change in the most common error category. Knowledge of these common errors and error categories could assist programming instructors in adjusting their teaching and learning approaches for novice programmers.
... Ettles, Luxton-Reilly and Denny analysed 15000 code fragments [16], created by novice programming students that contain logic errors. They classify the errors as algorithmic errors, misinterpretations of the problem, and fundamental misconceptions. ...
... Logical errors are the hardest of all error types to detect. Occasionally, referred to as semantic errors, there are situations where the programmer's code compiles successfully and executes but does not generate the proposed output for all possible inputs [16]. ...
... Os professores encontram dificuldades em trabalhar o raciocínio lógico de seus alunos, uma vez que nem sempre dispõem dos recursos didáticos necessários, e os estudantes, por sua vez, nem sempre se sentem motivados a buscar esses conhecimentos (Galdino et al., 2015). Já os alunos apontam que as informações obtidas nos recursos de apoio nem sempre são adequados por não oferecem um conteúdo contextualizado com a dificuldade e por nem sempre possibilitar uma pesquisa rápida (Santos & Menezes, 2019 As atividades propostas pelos professores para que os estudantes pratiquem programação, muitas vezes, são realizadas através de estratégias de tentativa e erro, que causam frustração na busca pela solução (Ettles et al., 2018). Esse processo, chamado de tentativa e erro, se mantém até que a solução correta seja alcançada (Almeida et al., 2017). ...
Full-text available
O processo de ensino e aprendizagem de programação é uma tarefa complexa que coloca desafios importantes a docentes e discentes. Ensinar a programar exige do professor uma forte demanda de interação a fim de atender, acompanhar, mediar e avaliar individualmente os alunos e suas atividades, e escolher os caminhos mais adequados para manter a motivação, o envolvimento e o bom desempenho destes. Aprender a programar é um processo complexo pois envolve diversas singularidades do domínio da programação, exige dos estudantes a prática constante e conhecimentos e habilidades específicos tais como interpretação e resolução de problemas. Esses aspectos podem dificultar o processo, causando desmotivação e frustração de docentes e discentes, bem como a desistência e reprovação nas Unidades Curriculares. Diante do exposto, nessa pesquisa tivemos como objetivo compreender ‘como ocorre o processo de ensino e aprendizagem de programação no Ensino Superior’, a fim de inventariar metodologias e ferramentas, caracterizar os fatores que influenciam na aprendizagem, explicitar os conhecimentos e habilidades necessários para aprender a programar e identificar as dificuldades enfrentadas por professores e alunos e o que estas dificuldades podem acarretar. A investigação é de natureza mista e teve como abordagem metodológica a realização de um estudo de caso único. Para tanto, selecionamos documentos e realizamos a aplicação de questionários e entrevistas a estudantes e professores de um curso superior na área de Informática no contexto brasileiro. Após a recolha de dados procedemos à análise e interpretação de conteúdo. Em seguida efetuamos a triangulação das fontes de evidências, categorizados à luz da fundamentação teórica e de acordo com os parâmetros ‘O que’, ‘Quem’, ‘Quando’, ‘Onde’, ‘Porque’, ‘Como’ e ‘Quanto’, baseados no framework 5W2H, a fim de compreender os sentidos dos dados e como eles respondem a questão central da tese. O estudo identifica contributos relacionados aos objetivos elencados, de forma que os resultados obtidos podem servir de suporte para a construção de novos conhecimentos para o desenvolvimento de estratégias que possam contribuir para o processo de ensino e aprendizagem de programação.
... In those cases, the sample solution printed a value, while the automated tests expected that the sample solution would return a value (e.g. the tests called a function and expected that the function would return a value, but the function printed a value). We note, of course, that a confusion between printing and returning values is a commonly cited error made by novices [44,45]. Another common issue was that the tests expected specific numbers that were not possible with the inputs (e.g. ...
Full-text available
In this article, we introduce and evaluate the concept of robosourcing for creating educational content. Robosourcing lies in the intersection of crowdsourcing and large language models, where instead of a crowd of humans, requests to large language models replace some of the work traditionally performed by the crowd. Robosourcing includes a human-in-the-loop to provide priming (input) as well as to evaluate and potentially adjust the generated artefacts; these evaluations could also be used to improve the large language models. We propose a system to outline the robosourcing process. We further study the feasibility of robosourcing in the context of education by conducting an evaluation of robosourced and programming exercises, generated using OpenAI Codex. Our results suggest that robosourcing could significantly reduce human effort in creating diverse educational content while maintaining quality similar to human-created content.
... This would enable novice learners to focus on the problem-solving aspects of the program and learn to apply the new language constructs, such as loop, and conditional statement, in meaningful contexts. Nevertheless, the logic or semantic errors in a program can be hard and frustrating for a novice learning to identify and fix (Ettles et al., 2018). In my own experience, students are also struggling to understand and fix syntax errors generated by the compiler. ...
Full-text available
Teaching an introductory programming course to first-year students has long been a challenge for many college instructors. The COVID-19 pandemic, which caused unprecedented shifts in learning modality across the globe, has worsened the learning experience of novice programmers. Instructors have to find innovative ways to keep students engaged and learning. Blended or hybrid learning has become a new preferred way of learning during the COVID-19 pandemic. Blended learning is viewed as a combination of both in-person and online instructions. Such a learning environment offers instructors the flexibility to provide learners with an engaging face-to-face learning experience while promoting the well-being and safety of students. Starting Fall 2020, York College (and other CUNY colleges) has since offered several courses in hybrid mode. Two years have passed since the abrupt transition. There were several lessons learned from the experiences. In this paper, I discussed evidence-based pedagogical approaches that were used to teach students in an introductory computer programming class at York College, CUNY, where blended learning was used. Student perceptions of learning experience and obtaining coding skills in both online and in-person environments are also presented. The findings from the survey suggested that students benefited from face-to-face interactions and feedback, while those who preferred an online environment liked the flexibility that online components offer. Through careful design and implementation of pedagogical approaches used in the class, novice programmers could potentially benefit from both face-to-face and online components of blended learning.
... In those cases, the sample solution printed a value, while the automated tests expected that the sample solution would return a value (e.g. the tests called a function and expected that the function would return a value, but the function printed a value). We note, of course, that a confusion between printing and returning values is a commonly cited error made by novices [24,36]. In addition, a common issue was that the tests expected specific numbers that were not possible with the inputs (e.g. ...
... In those cases, the sample solution printed a value, while the automated tests expected that the sample solution would return a value (e.g. the tests called a function and expected that the function would return a value, but the function printed a value). We note, of course, that a confusion between printing and returning values is a commonly cited error made by novices [24,36]. In addition, a common issue was that the tests expected specific numbers that were not possible with the inputs (e.g. ...
This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.
... However, the students, being novice programmers are facing challenges to develop a program in the aspects of syntactic, semantic and pragmatic. These are reflected in their disfluency of writing in the language syntax that results in unresolved syntax errors (Ettles, Luxton-Reilly, & Denny, 2018). ...
Full-text available
Requirement elicitation is a part of the application development process which determines the functional and non-functional requirements of the application. This study has elicited the requirements of a programming learning application by using several requirement elicitation techniques. The purpose of the application is to assist novice programmers in learning C language programming. The requirement elicitation was done by using qualitative approaches in a triangulation strategy. The triangulation involved literature reviews on related existing C-programming applications and semi-structured interviews with five expert programming lecturers in Malaysian Polytechnic. The requirement elicitation has identified an important approach of a programming learning application which is the programming visualization to help novices understand the program execution behaviour better. It has determined that the application should visualize the variable contents and program execution steps as its functional requirements. Meanwhile, the non-functional requirement of the application is that the application should be designed to be a simple IDE-based application since it is targeting the novices. The finding from this requirement elicitation is essential in determining important functions to be available in the developed application and how it is supposed to be implemented. This application could enhance programming skills of the students and prepare them to be competent programmers for the future industrial demand.
... The lack of viable knowledge of program behavior is an obstacle to learning and a source of frustration and unproductive failure. There is also some evidence linking conceptual difficulties to lower programming self-efficacy [79] and hard-to-fix bugs [42]. ...
Full-text available
We propose a framework for identifying, organizing, and communicating learning objectives that involve program semantics. In this framework, detailed learning objectives are written down as rules of program behavior (RPBs). RPBs are teacher-facing statements that describe what needs to be learned about the behavior of a specific sort of programs. Different programming languages, student cohorts, and contexts call for different RPBs. Instructional designers may define progressions of RPB rulesets for different stages of a programming course or curriculum; we identify evaluation criteria for RPBs and discuss tradeoffs in RPB design. As a proof-of-concept example, we present a progression of rulesets designed for teaching beginners how expressions, variables, and functions work in Python. We submit that the RPB framework is valuable to practitioners and researchers as a tool for design and communication. Within computing education research, the framework can inform, among other things, the ongoing exploration of “notional machines” and the design of assessments and visualizations. The theoretical work that we report here lays a foundation for future empirical research that compares the effectiveness of RPB rulesets as well as different methods for teaching a particular ruleset.
Full-text available
Conference Paper
The difficulty in understanding compiler error messages can be a major impediment to novice student learning. To alleviate this issue, multiple researchers have run experiments enhancing compiler error messages in automated assessment tools for programming assignments. The conclusions reached by these published experiments appear to be conducting. We examine these experiments and propose five potential reasons for the inconsistent conclusions concerning enhanced compiler error messages: (1) students do not read them, (2) researchers are measuring the wrong thing, (3) the effects are hard to measure, (4) the messages are not properly designed, (5) the messages are properly designed, but students do not understand them in context due to increased cognitive load. We constructed mixed-methods experiments designed to address reasons 1 and 5 with a specific automated assessment tool, Athene, that previously reported inconclusive results. Testing student comprehension of the enhanced compiler error messages outside the context of an automated assessment tool demonstrated their effectiveness over standard compiler error messages. Quantitative results from a 60 minute one-on-one think-aloud study with 31 students did not show substantial increase in student learning outcomes over the control. However, qualitative results from the one-on-one think-aloud study indicated that most students are reading the enhanced compiler error messages and generally make effective changes after encountering them.
Full-text available
Conference Paper
One common frustration students face when first learning to program in a compiled language is the difficulty in interpreting the compiler error messages they receive. Attempts to improve error messages have produced differing results. Two recently published papers showed conflicting results, with one showing measurable change in student behavior, and the other showing no measurable change. We conducted an experiment comparable to these two over the course of several semesters in a CS1 course. This paper presents our results in the context of previous work in this area. We improved the clarity of the compiler error messages the students receive, so that they may more readily understand their mistakes and be able to make effective corrections. Our goal was to help students better understand their syntax mistakes and, as a reasonable measure of our success, we expected to document a decrease in the number of times students made consecutive submissions with the same compilation error. By doing this, we could demonstrate that this enhancement is effective. After collecting and thoroughly analyzing our own experimental data, we found that--despite anecdotal stories, student survey responses, and instructor opinions testifying to the tool's helpfulness--enhancing compiler error messages shows no measurable benefit to students. Our results validate one of the existing studies and contradict another. We discuss some of the reasons for these results and conclude with projections for future research.
Full-text available
Programming is an essential skill that many computing students are expected to master. However, programming can be difficult to learn. Successfully interpreting compiler error messages (CEMs) is crucial for correcting errors and progressing toward success in programming. Yet these messages are often difficult to understand and pose a barrier to progress for many novices, with struggling students often exhibiting high frequencies of errors, particularly repeated errors. This paper presents a control/intervention study on the effectiveness of enhancing Java CEMs. Results show that the intervention group experienced reductions in the number of overall errors, errors per student, and several repeated error metrics. These results are important as the effectiveness of CEM enhancement has been recently debated. Further, generalizing these results should be possible at least in part, as the control group is shown to be comparable to those in several studies using Java and other languages.
Full-text available
Conference Paper
Encountering the same compiler error repeatedly, particularly several times consecutively, has been cited as a strong indicator that a student is struggling with important programming concepts. Despite this, there are relatively few studies which investigate repeated errors in isolation or in much depth. There are also few data-driven metrics for measuring programming performance, and fewer for measuring repeated errors. This paper makes two contributions. First we introduce a new metric to quantify repeated errors, the repeated error density (RED). We compare this to Jadud's Error Quotient (EQ), the most studied metric, and show that RED has advantages over EQ including being less context dependent, and being useful for short sessions. This allows us to answer two questions posited by Jadud in 2006 that have until now been unanswered. Second, we compare the EQ and RED scores using data from an empirical control/intervention group study involving an editor which enhances compiler error messages. This intervention group has been previously shown to have a reduced overall number of student errors, number of errors per student, and number of repeated student errors per compiler error message. In this research we find a reduction in EQ, providing further evidence that error message enhancement has positive effects. In addition we find a significant reduction in RED providing evidence that this metric is valid.
Full-text available
Conference Paper
One of the many challenges novice programmers face from the time they write their first program is inadequate compiler error messages. These messages report details on errors the programmer has made and are the only feedback the programmer gets from the compiler. For students they play a particularly essential role as students often have little experience to draw upon, leaving compiler error messages as their primary guidance on error correction. However these messages are frequently inadequate, presenting a barrier to progress and are often a source of discouragement. We have designed and implemented an editor that provides enhanced compiler error messages and conducted a controlled empirical study with CS1 students learning Java. We find a reduced frequency of overall errors and errors per student. We also identify eight frequent compiler error messages for which enhancement has a statistically significant effect. Finally we find a reduced number of repeated errors. These findings indicate fewer students struggling with compiler error messages.
Full-text available
Previous investigations of student errors have typically focused on samples of hundreds of students at individual institutions. This work uses a year's worth of compilation events from over 250,000 students all over the world, taken from the large Blackbox data set. We analyze the frequency, time-to-fix, and spread of errors among users, showing how these factors inter-relate, in addition to their development over the course of the year. These results can inform the design of courses, textbooks and also tools to target the most frequent (or hardest to fix) errors.
Full-text available
Conference Paper
The frequency of different kinds of error made by students learning to write computer programs has long been of interest to researchers and educators. In the past, various studies investigated this topic, usually by recording and analysing compiler error messages, and producing tables of relative frequencies of specific errors diagnostics produced by the compiler. In this paper, we improve on such prior studies by investigating actual logical errors in student code, as opposed to diagnostic messages produced by the compiler. The actual errors reported here are more precise, more detailed and more accurate than the diagnostic produced automatically. In order to present frequencies of actual errors, error categories were developed and validated, and student code captured at time of compilation failure was manually analysed by multiple researchers. The results show that error causes can be manually analysed by independent researchers with good reliability. The resulting table of error frequencies shows that prior work using diagnostic messages tended to group some distinct errors together in single categories, which can now be listed more accurately.
Full-text available
Identifying and correcting syntax errors is a challenge all novice programmers confront. As educators, the more we understand about the nature of these errors and how students respond to them, the more effective our teaching can be. It is well known that just a few types of errors are far more frequently encountered by students learning to program than most. In this paper, we examine how long students spend resolving the most common syntax errors, and discover that certain types of errors are not solved any more quickly by the higher ability students. Moreover, we note that these errors consume a large amount of student time, suggesting that targeted teaching interventions may yield a significant payoff in terms of increasing student productivity.
Conference Paper
We present a framework for automatic grading of programming exercises using property-based testing, a form of model-based black-box testing. Models are developed to assess both the functional behaviour of programs and their algorithmic complexity. From the functional correctness model a large number of test cases are derived automatically. Executing them on the body of exercises gives rise to a (partial) ranking of programs, so that a program A is ranked higher than program B if it fails a strict subset of the test cases failed by B. The model for algorithmic complexity is used to compute worst-case complexity bounds. The framework moreover considers code structural metrics, such as McCabe's cyclomatic complexity, giving rise to a composite program grade that includes both functional, non-functional, and code structural aspects. The framework is evaluated in a course teaching algorithms and data structures using Java.
Conference Paper
Visual program simulation (VPS) is a form of interactive program visualization in which novice programmers practice tracing computer programs: using a graphical interface, they are expected to correctly indicate each consecutive stage in the execution of a given program. Naturally, students make mistakes during VPS; in this article, we report a study of such mistakes. Visual program simulation tries to get students to act on their conceptions; a VPS-supporting software system may be built so that it reacts to student behaviors and provides feedback tailored to address suspected misconceptions. To focus our efforts in developing the feedback given by our VPS system, UUhistle, we wished to identify the most common mistakes that students make and to explore the reasons behind them. We analyzed the mistakes in over 24,000 student-submitted solutions to VPS assignments collected over three years. 26 mistakes stood out as relatively common and therefore worthy of particular attention. Some of the mistakes appear to be related to usability issues and others to known misconceptions about programming concepts; others still suggest previously unreported conceptual difficulties. Beyond helping us develop our visualization tool, our study lends tentative support to the claim that many VPS mistakes are linked to programming misconceptions and VPS logs can be a useful data source for studying students' understandings of CS1 content.