ArticlePDF Available

Abstract and Figures

As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming,there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.
Content may be subject to copyright.
Introductory Programming: A Systematic Literature Review
Andrew Luxton-Reilly
University of Auckland
New Zealand
University of Newcastle
Ibrahim Albluwi
Princeton University
United States of America
Brett A. Becker
University College Dublin
Michail Giannakos
Norwegian University of Science and
Amruth N. Kumar
Ramapo College of New Jersey
United States of America
Linda Ott
Michigan Technological University
United States of America
James Paterson
Glasgow Caledonian University
United Kingdom
Michael James Scott
Falmouth University
United Kingdom
Judy Sheard
Monash University
Claudia Szabo
University of Adelaide
As computing becomes a mainstream discipline embedded in the
school curriculum and acts as an enabler for an increasing range of
academic disciplines in higher education, the literature on introduc-
tory programming is growing. Although there have been several
reviews that focus on specic aspects of introductory programming,
there has been no broad overview of the literature exploring recent
trends across the breadth of introductory programming.
This paper is the report of an ITiCSE working group that con-
ducted a systematic review in order to gain an overview of the
introductory programming literature. Partitioning the literature
into papers addressing the student, teaching, the curriculum, and
assessment, we explore trends, highlight advances in knowledge
over the past 15 years, and indicate possible directions for future
Social and professional topics Computing education.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6223-8/18/07.. .$15.00
ITiCSE working group; CS1; introductory programming; novice
programming; systematic literature review; systematic review; lit-
erature review; review; SLR; overview
ACM Reference Format:
Andrew Luxton-Reilly, Simon, Ibrahim Albluwi, Brett A. Becker, Michail
Giannakos, Amruth N. Kumar, Linda Ott, James Paterson, Michael James
Scott, Judy Sheard, and Claudia Szabo. 2018. Introductory Programming:
A Systematic Literature Review. In Proceedings of the 23rd Annual ACM
Conference on Innovation and Technology in Computer Science Education
(ITiCSE Companion ’18), July 2–4, 2018, Larnaca, Cyprus. ACM, New York,
NY, USA, 52 pages.
Teaching students to program is a complex process. A 2003 review
by Robins et al. [
] provided a comprehensive overview of novice
programming research prior to that year. The rst paragraph of the
review sets the general tone:
Learning to program is hard [.. . ] Novice programmers
suer from a wide range of diculties and decits. Pro-
gramming courses are generally regarded as dicult,
and often have the highest dropout rates. [554, p137]
However, more recent studies have suggested that the situation is
not as dire as previously suggested. Studies indicate that dropout
rates among computing students are not alarmingly high [
and it has been suggested that the diculties faced by novices may
be a consequence of unrealistic expectations rather than intrin-
sic subject complexity [
]. Better outcomes are likely to arise
from focusing less on student decits and more on actions that
the computing community can take to improve teaching practice.
In this paper we investigate the literature related to introductory
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
programming and summarise the main ndings and challenges for
the computing community.
Although there have been several reviews of published work
involving novice programmers since 2003, those reviews have gen-
erally focused on highly specic aspects, such as student miscon-
ceptions [
], teaching approaches [
], program comprehen-
sion [
], potentially seminal papers [
], research methods ap-
plied [
], automated feedback for exercises [
], competency-
enhancing games [
], student anxiety [
], and program visual-
isation [631].
A review conducted contemporaneously with our own, by
Medeiros et al
. [436]
, is somewhat broader in scope than those
mentioned above, but not as broad as our own. It investigates the
skills and background that best prepare a student for program-
ming, the diculties encountered by novice programmers, and the
challenges faced by their instructors.
We review papers published between 2003 and 2017 inclusive. Pub-
lications outside this range are not included in the formal analysis,
but may be included in discussion where appropriate.
In selecting papers for review, we make a clear distinction be-
tween those involving introductory programming — the focus of
our review — and those about other aspects of introductory com-
puting. For example, the literature of computing includes many
papers on aspects of computational thinking. This review addresses
such papers only where they where they have a clear focus on
introductory programming.
We have limited our scope to units of teaching corresponding to
introductory programming courses, thus ruling out shorter and less
formal units such as boot camps and other outreach activities. As
it became apparent that we needed to reduce the scope still further,
we also excluded work on introductory programming courses at
school level (also known as K–12) and work explicitly concerning
introductory computing courses for non-computing students (also
known as non-majors). Some papers in these areas are still included
in our discussion, but only if they contribute to our principal focus
on introductory programming courses for students in computing
As recommended by an ITiCSE working group on worldwide
terminology in computing education [
], this report tends in
general to avoid the term ‘computer science’, preferring instead the
less restrictive term ‘computing’.
The working group conducted a systematic literature review by
adapting the guidelines proposed by Kitchenham
. In this
review, we followed a highly structured process that involved:
(1) Specifying research questions
(2) Conducting searches of databases
(3) Selecting studies
(4) Filtering the studies by evaluating their pertinence
(5) Extracting data
(6) Synthesising the results
(7) Writing the review report
3.1 Research Questions
This review aims to explore the literature of introductory program-
ming by identifying publications that are of interest to the com-
puting community, the contributions of these publications, and the
evidence for any research ndings that they report. The specic
research questions are:
What aspects of introductory programming have been
the focus of the literature?
What developments have been reported in introductory
programming education between 2003 and 2017?
What evidence has been reported when addressing dif-
ferent aspects of introductory programming?
3.2 Conducting Searches
Selecting search terms for a broad and inclusive review of intro-
ductory literature proved challenging. Terms that are too general
result in an unwieldy set of papers, while terms that are too specic
are likely to miss relevant papers. After some trial and error with
a range of databases, we selected a combined search phrase that
seemed to capture the area of interest:
"introductory programming" OR "introduction to pro-
gramming" OR "novice programming" OR "novice
programmers" OR "CS1" OR "CS 1" OR "learn pro-
gramming" OR "learning to program" OR "teach pro-
To check whether this search phrase was appropriate, we applied
it to a trial set of papers and compared the outcome with our own
thoughts as to which papers from that set would fall within the
scope of our review. We chose the papers from the proceedings of
ICER 2017 and ITiCSE 2017, 94 papers in all. Ten members of the
working group individually decided whether each paper was rele-
vant to the review. The members then formed ve pairs, discussed
any dierences, and resolved them.
The inter-rater reliability of this process was measured with the
Fleiss-Davies kappa [
], which measures the agreement when
a xed set of raters classify a number of items into a xed set of
categories. In principle we were classifying the papers into just two
categories, yes and no, but some members were unable to make this
decision for some papers, introducing a third category of undecided.
The Fleiss-Davies kappa for individual classication was 61%. It has
been observed [
] that classication in pairs is more reliable than
individual classication, and this was borne out with our paired
classication, which resulted in a Fleiss-Davies kappa of 73% — and
the disappearance of the undecided category.
When measuring inter-rater reliability, an agreement of less than
40% is generally considered to be poor, between 40% and 75% is
considered fair to good, and more than 75% is rated excellent [
By this criterion, our individual agreement was good and our paired
agreement was substantially better. This process resulted in the
selection of 29 papers, those that a majority of pairs agreed were
pertinent to our review.
We then automatically applied the search terms to the same
set of 94 papers, resulting in a selection of 32 papers: 25 of the 29
that we had selected and seven false positives, papers that were
indicated by the search terms but not selected by us. This proportion
of false positives was not a major concern, because every selected
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
paper was going to be examined by at least one member of the
team and could be eliminated at that point. There were also four
false negatives, papers that we deemed relevant but that were not
identied by the search terms. False negatives are of greater concern
because they represent relevant papers that will not be identied
by the search; but, unable to nd a better combination of search
terms, we accepted that our search might fail to identify some 15%
of pertinent papers.
The search terms were then applied to the title, abstract, and
keyword elds of the ACM Full Text Collection, IEEE Explore,
ScienceDirect, SpringerLink and Scopus databases. The search was
conducted on 27 May 2018, and identied the following numbers
of papers:
ACM Full text collection: 2199
IEEE Explore: 710
ScienceDirect (Elsevier): 469
SpringerLink (most relevant 1000): 1000
Scopus (most relevant 2000): 2000; 678 after removal of du-
Total: 5056
3.3 Selecting Studies
The next stage of a systematic review is to select the papers that will
form the basis for the review. The search results were divided among
the authors, who examined each title and abstract, and the corre-
sponding full paper if required, to determine its relevance to the
review. We eliminated papers that were irrelevant, papers that were
less than four pages long (such as posters), and papers that were
clearly identied as work in progress. The biggest reductions were
seen in the more general ScienceDirect and SpringerLink databases,
where, for example, ‘CS1’ can refer to conditioned stimulus 1 in a
behavioural studies paper, cesium 1 in a paper on molecular struc-
tures, and connecting segment 1 in a paper on pathology. This
process reduced the pool of papers by more than half, as shown
ACM Full text collection: 1126 (51%)
IEEE Explore: 448 (63%)
ScienceDirect (Elsevier): 62 (13%)
SpringerLink (most relevant 1000): 204 (20%)
Scopus: 349 (51%)
Total: 2189 (43%)
3.4 Filtering and Data Analysis
Following the selection of papers, the team collectively devised a
set of topics that might cover the papers we had been seeing. This
began with a brainstormed list of topics, which was then rened
and rationalised. The topics were then gathered into four high-level
groups: the student, teaching, curriculum, and assessment. The
rst three of these groups have together been called the ‘didactic
triangle’ of teaching [
]. While it is not one of these three core
elements, assessment is a clear link among them, being set by the
teacher and used to assess the student’s grasp of the curriculum.
The 2189 papers were divided among the authors, each of whom
classied approximately 200 papers using the abstract and, where
necessary, the full text of the paper. During this phase some 500
further papers were excluded upon perusal of their full text and
some 25 papers because the research team was unable to access
them. The remaining 1666 papers were classied into at least one
and often several of the categories. This classifying, or tagging,
constituted the rst phase of the analysis: the data extracted from
each paper were the groups and subgroups into which the paper
appeared to t.
Small groups then focused on particular topics to undertake the
remaining steps of the systematic process: evaluating the pertinence
of the papers, extracting the relevant data, synthesising the results,
and writing the report. The data extracted at this stage were brief
summaries of the points of interest of each paper as pertaining
to the topics under consideration. As the groups examined each
candidate paper in more depth, a few papers were reclassied by
consensus, and other papers were eliminated from the review. At the
completion of this phase, some of the initial topics were removed
because we had found few or no papers on them (for example,
competencies in the curriculum group), and one or two new topics
had emerged from examination of the papers (for example, course
orientation in the teaching group).
In a systematic literature review conducted according to Kitchen-
ham’s guidelines [
], candidate papers would at this point have
been ltered according to quality. This process was followed in
a limited manner: as indicated in the two foregoing paragraphs,
some papers were eliminated upon initial perusal and others upon
closer examination. However, our focus was more on the perti-
nence of papers to our subject area than on their inherent quality,
so at this stage we could be said to have deviated somewhat from
Kitchenham’s guidelines.
A further deviation from Kitchenham’s guidelines arises from the
number of papers identied by our search. It would be impractical
to list every paper that has addressed every topic, and even more
impractical to discuss any papers in depth. Therefore our intent is
to give a thorough view of the topics that have been discussed in the
literature, referring to a sample of the papers that have covered each
topic. Except where the context suggests otherwise, every reference
in the following sections should be understood as preceded by an
implicit ‘for example’.
Table 1shows the number of papers in each group and the sub-
groups into which some or all of these papers were classied. The
majority of papers fall into the teaching category, most of them
describing either teaching tools or the various forms of delivery
that have been explored in the classroom. A substantial number of
papers focus on the students themselves. We see fewer papers that
discuss course content or the competencies that students acquire
through the study of programming. The smallest of the broad topic
areas is assessment, which is interesting since assessment is such a
critical component of courses and typically drives both teaching
and learning.
Given the comprehensive nature of this review, it is inevitable
that some papers will be discussed in more than one section. For
example, a paper exploring how students use a Facebook group
to supplement their interactions in the university’s learning man-
agement system [
] is discussed under both student behaviour
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
Table 1: Initial classication of 1666 papers, some classied
into two or more groups or subgroups
Group Papers Optional subgroups
The student 489
student learning, underrepresented
groups, student attitudes, student be-
haviour, student engagement, student
ability, the student experience, code
reading, tracing, writing, and debug-
Teaching 905
teaching tools, pedagogical ap-
proaches, theories of learning,
Curriculum 258
competencies, programming lan-
guages, paradigms
Assessment 192
assessment tools, approaches to as-
sessment, feedback on assessment,
academic integrity
(section 5.1.3) and teaching infrastructure (section 6.5). While fur-
ther rationalisation might have been possible, we consider that
readers are best served by a structure that considers broad cate-
gories and then surveys the papers relevant to each, even if that
entails some duplication.
Figure 1shows the number of papers that we identied in the
data set, arranged by year. It is clear that the number of publications
about introductory programming courses is increasing over time.
To check whether introductory programming is a growing focus
specically in the ACM SIGCSE conferences, we counted the papers
in our data set that were published each year in ICER (which began
in 2005), ITiCSE, or the SIGCSE Technical Symposium, and com-
pared this with the total number of papers published each year in
those three conferences. Figure 2shows that publication numbers
in the three main ACM SIGCSE conferences remain fairly stable
between 2005 and 2017. Publications from these venues focusing on
introductory programming, although somewhat variable, also re-
main relatively stable across the same period. We conclude that the
main growth in publications is occurring outside SIGCSE venues,
which might indicate that programming education is of growing
interest to the broader research community. Alternatively, it might
indicate that authors are seeking more venues because there is no
growth in the numbers of papers accepted by the SIGCSE venues.
This section explores publications that focus primarily on the stu-
dent. This includes work on learning disciplinary content knowl-
edge, student perceptions and experiences of introductory program-
ming, and identiable subgroups of students studying programming.
Table 2gives an overview of the categories and corresponding num-
bers of papers. The sum of the numbers in the table does not match
the number in Table 1because some papers were classied into
more than one category.
Figure 1: Introductory programming publications identied
by our search
Figure 2: Introductory programming publications identied
by our search and published in ICER, ITiCSE or SIGCSE,
compared with total publications in ICER, ITiCSE and
Table 2: Classication of papers focused on students
Category N Description
– Theory 17 Models of student understanding
– Literacy 58 Code reading, writing, debugging
– Behaviour 69 Measurements of student activity
– Ability 169 Measuring student ability
– Attitudes 105 Student attitudes
– Engagement 61 Measuring/improving engagement
– Experience 18 Experiences of programming
– At risk 17 Students at risk of failing
– Underrep. 25 Women and minorities
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
5.1 Content
The categories in this section relate to measuring what students
learn and how they learn it. We begin by considering work that
applies a cognitive lens to student understanding. We then move
to publications on what we term code literacy (i.e., reading, writ-
ing, and debugging of code), before moving to measurable student
behaviour. The section concludes by considering broad ways that
student ability is addressed in research. Figure 3illustrates the
growth of papers focusing on the interaction between students and
the content taught in introductory programming courses.
Figure 3: Number of publications focusing on interaction be-
tween students and the content — theory, literacy, behaviour
and ability — by year
5.1.1 Theory.
Several papers grounded in various theoretical perspectives
study the thinking processes of novice programmers. The num-
ber of papers focusing on the theoretical perspectives is relatively
small (no more than 3 papers in any given year), with no discernible
trend over the period of our study.
The constructivist point of view suggests that learners construct
their own mental models of the phenomena they interact with [
Several papers have investigated novice programmers’ viable and
non-viable mental models of concepts such as variables [
], pa-
rameter passing [
], value and reference assignment [
], and
how objects are stored in memory [
]. Students were found to
hold misconceptions and non-viable mental models of these funda-
mental concepts even after completing their introductory program-
ming courses. To address this issue, Sorva [
] recommends
the use of visualisation tools with techniques from variation the-
ory, while Ma et al. [
] recommend the use of visualisation
tools with techniques from cognitive conict theory. Interestingly,
both Madison and Giord
and Ma et al
. [399]
report that
students holding non-viable mental models sometimes still manage
to do well on related programming tasks, suggesting that assess-
ment techniques beyond conventional code-writing tasks might be
needed to reveal certain misconceptions.
Proposing a conceptual framework and a graphical representa-
tion that can be used to help students construct a viable mental
model of program-memory interaction, Vagianou
argues that
program-memory interaction exhibits the characteristics of a thresh-
old concept, being troublesome, transformative, and potentially ir-
reversible. Sorva [
] distinguishes between threshold concepts
and fundamental ideas, proposing that threshold concepts act as
‘portals’ that transform the students’ understanding, while funda-
mental ideas “run threadlike across a discipline and beyond”. Sorva,
who has conducted a comprehensive review of research on mental
models, misconceptions, and threshold concepts [
], suggests
that abstraction and state might be fundamental ideas while pro-
gram dynamics, information hiding, and object interaction might
be threshold concepts.
Lister [
] and Teague et al. [
] apply a neo-Piagetian
perspective to explore how students reason about code. They dis-
cuss the dierent cognitive developmental stages of novice pro-
grammers and use these stages to explain and predict the ability or
otherwise of students to perform tasks in code reading and writing.
The most important pedagogical implication of this work is that in-
structional practice should rst identify the neo-Piagetian level that
students are at and then explicitly train them to reason at higher
levels. They contrast this with conventional practices, where they
argue that teaching often happens at a cognitive level that is higher
than that of many students [381,656658].
Due to the qualitative nature of research done both on mental
models and on neo-Piagetian cognitive stages, more work is needed
to quantitatively study what has been observed. For example, it is
still not clear how widespread the observed mental models are or
which neo-Piagetian cognitive stages are more prevalent among
novice programmers in dierent courses or at dierent times in the
same course. The small numbers of participants in these qualitative
studies suggest the need for more studies that replicate, validate,
and expand them.
5.1.2 Code ‘Literacy’.
Literacy, a term traditionally applied to the reading and writing
of natural language, is concerned with making sense of the world
and communicating eectively. In modern usage this has broad-
ened to refer to knowledge and competency in a specic area, for
example, ‘computer literacy’. Here, however, we apply the term in
the traditional sense to coding, using it to mean how students make
sense of code and how they communicate solutions to problems by
writing executable programs. We distinguish between reading and
writing, and consider research that seeks insights into the students’
processes in each.
Code reading and tracing. The process of reading programs is
essential both in learning to program and in the practice of pro-
gramming by experts. We found 28 papers reporting on issues
related to students’ code reading. These included papers where the
reading process involved tracing the way a computer would exe-
cute the program, which adds a signicant dimension to ‘making
sense’ that is not present in the largely linear process of reading
natural-language text.
A number of papers study the reading process in order to gain
insight into students’ program comprehension, for example by re-
lating reading behaviour to well-known program comprehension
models [
] or the education-focused block model [
]. There has
been recent interest in the application of eye-tracking techniques
to novice programmers, for example to study changes in reading
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
process as students progress through a course and to identify clus-
ters of students with similar learning paths [
]. Although
useful work has been done, the process of reading code, and the
way this process changes as a novice programmer gains expertise,
are not well understood.
There have been major studies of student code-reading and trac-
ing skills, for example by Lister et al
. [382]
, which have raised
concerns about weaknesses in many students’ abilities. The re-
lationship between code reading and other skills, notably code
writing, has also been widely studied [
], lead-
ing to the conclusion that code-reading skills are prerequisite for
problem-solving activities including code writing. Given this, there
is a strong incentive to help students to develop reading skills, and
a number of papers focus on the eects on writing skills of specic
activities that are designed to provide practice in reading [
Tools have been developed to support students in code reading
by helping them to identify ‘beacons’ in code [
] and to help
teachers to target support by visualising students’ code-tracing pro-
cesses [
]. Given the evidence that has been found for the value
of code-reading skills in novice programmers, there is an ongoing
need to explore further ways of encouraging the development of
these skills.
Code writing and debugging. Papers in this category focus on
students’ ability to write programs, how they approach the creation
of programs, and ways of supporting the process. Creating pro-
grams involves writing code and debugging that code in order to
reach a working solution. We found 48 papers focusing on aspects
of writing and debugging code.
The McCracken report from an ITiCSE working group in
2001 [
], which raised concerns about students’ skills at the end
of the introductory programming course, has been inuential in
the study of student code-writing skills during the years covered by
this review. Utting et al. revisited the topic in a subsequent ITiCSE
working group [
], and discovered a closer match between teach-
ers’ expectations and student skills than the original study. They
noted as a possible reason that teachers’ expectations have lowered,
possibly as a consequence of previous research. A number of papers
focus not just on students’ performance on writing tasks, but also
on what this reveals about the diculty and suitability of those
tasks, such as the canonical ‘rainfall problem’ [
The types of task that we give to students, and the way we present
and scaold these tasks, are important to students’ performance
and to their learning through undertaking the tasks, and insights
in this area will continue to be of value.
Other papers focus on supporting students in writing tasks, gen-
erally addressing the way students deal with the errors they en-
counter as they write code rather than the process of planning or
designing their programs. Zehetmeier et al
. [729]
describe teaching
interventions designed to help with common errors. Compilation
is an important part of the process of code writing and there has
been a signicant body of work related to compiler error messages,
including studying the comprehensibility of standard compiler mes-
sages [
] and providing enhanced messages to help novices [
A number of approaches and tools have been proposed to encour-
age students to take a systematic approach to debugging, includ-
ing a serious game [
], error messages that are pedagogically
designed [
] or based on social recommendations from other stu-
dents’ activities [251], and programming environments that allow
students to interactively ask questions about program output [
An ITiCSE working group [
] developed a repository of videos to
help students with debugging. Bennedsen and Schulte
the impact of using the BlueJ debugger to visualise code execution,
but there are actually very few papers that study how students use
the debuggers built into most IDEs. These environments provide
rich tools to provide insight into the execution of code, and there
are open questions around their accessibility and usefulness in the
early stages of learning to program.
Summary. Research in this area has advanced understanding of
the ways in which students read and write code and the expectations
that are realistic for teachers to hold for their ability to do so. A
key nding is the importance of code-reading skills to underpin
other aspects code literacy. Linguists hypothesise that we acquire
writing style by reading, and there is evidence that this applies to
programming as well as to natural language. Guidance has also
been derived for designing appropriate code reading and writing
tasks and supporting the students in these processes. For example,
emerging techniques such as eye tracking promise new insights
into the process of code reading and may also guide teachers in
presenting code examples in a ‘readable’ form to enhance learning.
5.1.3 Student Behaviour.
Students generate a great deal of data as they go about their
activities, for example in the way they interact with coding envi-
ronments, learning tools, and the classroom. There is increasing
interest in gathering and analysing this data in order to learn about
the ways in which students behave. This section reports on the
69 papers we identied that report data and ndings on student
behaviour. More than half of these (36 papers) were published in
the last three years (2015-2017) of the fteen-year period covered in
this review (see Figure 4). This may reect an increasing awareness
of the value of data and the emergence and wide use of machine
learning and data mining, although by no means all of the papers
use these techniques.
Figure 4: Papers classied as being about student behaviour
have increased rapidly over the past few years
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
Many of these papers can be considered to describe learning
analytics. Hui and Farvolden
recently proposed a framework
to classify ways in which learning analytics can be used to shape
a course, considering how data can address the instructors’ needs
to understand student knowledge, errors, engagement, and expec-
tations, and how personalisation of feedback based on data can
address students’ needs to plan and monitor their progress. Many
aspects of their framework are reected in the themes discovered
in the current review. An understanding of student behaviour is of
value to educators in at least the following ways:
Predicting success in examinations and other assessments
Identifying student diculties and interventions designed
to mitigate them
Designing tools that respond to specic behaviours in a way
that is helpful to the students
Encouraging students to alter their behaviour to improve
their likelihood of success
Detecting undesirable behaviour such as cheating
A previous ITiCSE working group [
] has reviewed the litera-
ture on educational data mining and learning analytics in program-
ming (not restricted to introductory programming), describing the
state of the art in collecting and sharing programming data and
discussing future directions. The report from that working group
complements this work by providing in-depth coverage of this
specic topic.
Coding behaviour data. Students in introductory programming
courses spend a lot of time creating code, but what are they actually
doing when they code? Some papers describe the use of data to
provide insights into how students go about and experience the
process of coding, reporting analyses of observations of the students
as they code or of the products of the coding.
The largest number of papers report analysis of compilation
behaviour. Brown et al
. [93]
describe a repository of compilation
data that is populated by users worldwide of the BlueJ IDE. This
follows earlier work by the BlueJ team on recording compilation
behaviour [
], and allows other researchers to use the tool to
gather data for their own research [
]. Other researchers have
embedded the capability to record similar data in the tools that
their students use to write code [
]. In order to extract in-
sights from compilation data, Jadud [
] dened an error quotient
(EQ) metric, which was further explored by Petersen et al
. [507]
while Becker
has proposed an alternative metric. Compilation
data has been used to provide feedback to students in the form of
enhanced error messages, although the results have been mixed in
terms of the value of these messages to students [
]. Compi-
lation can provide insights into aspects of behaviour other than the
errors the student has to deal with; for example, Rodrigo and Baker
[555] used compilation data to study student frustration.
Compilation is an important aspect of the coding process, but
other aspects have been studied, including keystrokes and timing
within a session [
] and evolution of code over an extended
period [514].
In addition to the process, the code artefacts created by students
have been studied. The correctness of code, typically measured by
whether it passes automated tests on submission, has been used to
identify common student diculties [
] and competencies [
and to predict course outcomes and target at-risk students [
Code quality has been analysed to allow instructors to target im-
provements in the guidance given to students. Measurements of
code quality have included traditional [
] and novel [
] soft-
ware metrics and a continuous inspection tool [
]. While coding
quality is a notion that comes from software engineering, Bum-
bacher et al
. [99]
propose a set of metrics, which they refer to as
‘coding style’, that can be used to monitor progress in learning
and to predict help-seeking. Hovemeyer et al
. [273]
and Pero
have used code structures, in the form of abstract syntax trees de-
rived from students’ code, to draw conclusions about student skill
levels. Loksa and Ko
combine student code with think-aloud
protocols to investigate self-regulation of learning in programming.
Other behaviour data. While coding tools play an important part
in studying programming, students interact with a range of other
more generic learning resources such as learning management sys-
tems (LMS). Although such resources are widely used and studied
across disciplines, there is a growing body of work specically
focused on their use in introductory programming.
Some researchers have applied techniques for machine learn-
ing [
] and data mining [
] to study interaction by intro-
ductory programming students with an LMS in order to identify
behaviour patterns and investigate whether these can predict aca-
demic success. Other papers focus on introductory programming
students’ use of specic types of resource provided within a learn-
ing environment, such as video lectures [149,428] and interactive
reading and homework materials [
] designed to enable a ipped
classroom approach. The latter paper identies some key implemen-
tation factors that encourage engagement with interactive materials.
There has also been research on the use of social media, beyond
the LMS, for online learning [408].
Although there is an emphasis in the literature on online be-
haviours, Hsu and Plunkett
study the correlation between
class attendance and academic grades, while Chinn et al
. [125]
vestigate several study habits, including online and other activities.
Summary. The value of data in providing insights and predic-
tive power is becoming increasingly recognised in many aspects of
human activity, and techniques and tools for data science are be-
coming increasingly accessible. Computing education is beginning
to make use of these, but we have as yet only scratched the surface
of what can be done. In particular, it seems that there is signicant
potential to learn about student learning through analysis of coding
behaviour. There has been progress in mechanisms for gathering
data, but there is a great deal of scope for exploring the correlation
between behaviour and learning outcomes [
], and perhaps for
understanding the dierences in behaviour between novice and
expert programmers.
5.1.4 Student Ability.
Students’ ability in introductory programming courses can be
thought of as what they can achieve in terms of learning, under-
standing, and applying programming. In this category we identi-
ed 169 entries, most of which infer students’ ability from mea-
surements related to their performance and success in program-
ming [
]. The use of the word ‘ability’ in this context
should not be taken to infer that this is a xed quantity. In particular,
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
it does not contradict the notion of ‘growth mindset’, which recog-
nises that students’ ability can change as their studies progress.
Students’ ability has been measured by various means such as
code assessment [
], self-reporting [
], knowledge acquisition
tests [
], exams [
], and various learning analytics [
Some studies combine dierent types of measurement [
Although students’ grades and traditional assessment are still the
dominant measurement of their ability, there are moves towards
more sophisticated measurements coming from high-frequency
data collections.
A number of papers focus on the capacity of student-generated
data to provide accurate estimations of students’ ability. In recent
years we have seen papers using student-generated data such as
source code and log les to compare variables that can predict
students’ performance [
]. In addition, we have seen large-scale
student-generated data that can be analysed to guide (or ques-
tion) teachers’ decisions and judgement about students’ ability [
Despite the diculties in adopting and using student-generated
data (e.g., from large-scale projects and sophisticated pedagogi-
cal practices) to infer students’ ability, student-generated data is a
growing area of research in introductory programming [
that promises more accurate and reliable measurements of stu-
dents’ ability, for example by tracking learning progression during
Many papers propose specic pedagogical practices such as
community-of-inquiry learning, pair programming, and team-based
learning [
], suggesting that these practices can potentially
improve students’ ability to learn programming. Such papers gener-
ally report on the introduction of a practice to cope with a specic
diculty or misconception in programming, and often conclude
with a possibly rigorous empirical evaluation and a discussion of
lessons learned.
Some recent publications focus on the relationship between stu-
dents’ programming ability and other abilities such as problem
solving [
] and engagement [
], traits such as conscien-
tiousness [
] and neuroticism [
], and attitudes such as self-
ecacy [313].
In summary, publications on the ability of introductory program-
ming students typically address one or more of these questions:
How can students’ ability, that is, their success or perfor-
mance in programming, be measured more accurately and
How can the various types of student-generated data inform
students/teachers about ability and predict future success?
What is the relationship between students’ programming
ability and their other abilities and personality traits, atti-
tudes, and engagement?
What pedagogical practices have the capacity to enhance
students’ ability?
5.2 Sentiment
In this section we report on research that investigates the student
perspective of learning to program — the attitudes of students,
the extent of their engagement, and the experience of learning
to program. Figure 5illustrates the growth of papers focusing on
student sentiment.
Figure 5: Number of publications per year focusing on stu-
dent attitudes, engagement, or experience
5.2.1 Student Aitudes.
For the purpose of classifying papers, ‘attitudes’ has a broad
denition. Eagly and Chaiken
dene attitude as “a psycho-
logical tendency that is expressed by evaluating a particular entity
with some degree of favour or disfavour”. We take that to include
self-perceptions, where the ‘entities’ are the students themselves
or certain of their characteristics, such as current level of program-
ming skill. We also take it to include perceptions of particular
aspects of introductory programming, whether more abstract, such
as programming as a discipline, or more concrete, such as a tool.
Given the challenges that educators face in the introductory
programming course, particularly with respect to retention [
], ways of supporting students to develop positive attitudes have
received considerable attention in the literature (105 papers). This is
perhaps underscored by McKinney and Denton’s 2004 paper [
which reports a concerning decline in aective factors during the
rst programming course, a challenge that persists more than ten
years after its publication [587].
Although a considerable number of attitudinal inuences have
been linked with measures of course success, no construct has re-
ceived more attention than self-ecacy and self-perception. This
body of work includes key papers by Kinnunen and Simon which
examine self-ecacy through a theoretic lens [
] and which
identify sometimes counter-intuitive processes through which stu-
dents’ self-ecacy can change [
]. Though the importance of the
construct is consistent with other areas [
], recent work throws
into question whether conventional designs for learning activities
that are intended to lead to enacted mastery experiences are ap-
propriate and readily transferable to the domain of introductory
programming. Further complicating this challenge is the notion that
practices could disproportionately aect dierent groups. Some pa-
pers on this topic are covered in sections 5.3.1 and 5.3.2. However, it
is worth noting that recent work continues to highlight dierences
in self-ecacy between genders [378,531].
Within this domain is a considerable body of work on the per-
ceived diculty of programming courses [
] and its relation to
persistence in computing degrees [
]. Only the most engaged
students are good at predicting their performance in a course [
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
so when students tend to pick tasks that they perceive to be eas-
ier [
], it is not clear whether the diculties they perceive are
in the task or in themselves as individuals. The literature seems
to acknowledge that when addressing such perceptions, the rst
impression is important [
], as is the consideration of perceived
diculty in the design of pedagogic approaches [223].
Several studies have explored ways to enrich student attitudes,
often with the intention of improving some aspect of course ‘health’
or the student experience. These interventions include collaborative
scaolding [
], problem-based learning [
], peer tutoring [
pedagogical code review [
], facilitated e-learning [
], and peer
instruction [731].
A related area where successful interventions are also reported
focuses on Dweck’s notion of ‘mindset’, the level of belief that
one can grow and develop. Murphy and Thomas [
] warn edu-
cators of the dangers of students developing a ‘xed’ mindset in
the introductory programming context. Further literature has since
explored the design of mindset interventions [
]. Cutts
et al
. [143]
report some success by teaching students about the
concept of mindset and using messages embedded in coursework
feedback to reinforce a ‘growth’ mindset.
There remains a need for further work comparing the eect sizes
of interventions in order to guide educators towards the most ef-
fective practices. Student self-beliefs therefore remain a fertile eld
of research with continuing challenges. Calls for greater method-
ological rigour [
] might pave the way for such comparisons.
For example, a number of measurement instruments have been
developed and formally validated for use in the introductory pro-
gramming context [
] and their use might facilitate new
developments in this eld.
Another core area of attitudinal research considers attitudes
towards particular tools, techniques, platforms, and pedagogical
approaches. Much of this research is combined into studies explor-
ing the tools themselves, and is covered in section 6.4. Examples
of such attitudinal research include impressions of pair program-
ming [
], attitudes towards the use of tools such as
Scratch [
], satisfaction with self-assessment [
], impressions
of prior experience [
], and the use of consistent instructors [
This area has considerable breadth and leverages a rich set of
methodologies. Although validity is a spectrum rather than an abso-
lute [
], there are several ways that methodological rigour could
be improved in future work. Many of the recommendations pub-
lished in the 2008 methodological review by Randolph et al
. [537]
are yet to be widely adopted, and there is scope to address similar
concerns surrounding measurement [
]. Furthermore, papers in
this area during the review period tend to focus on WEIRD popula-
tions (western, educated, industrialised, rich, and democratic) [
exposing a lack of cross-cultural studies and undermining the gener-
alisability of the research to an international audience. There is also
a shortage of research replicating experiments that use validated
measurement instruments; such research would oer reassurance
of the consistency with which particular attitudes are observed in
dierent contexts.
5.2.2 Student Engagement.
Student engagement in introductory programming has received
considerable attention over the period of this review, encompassing
papers focused on time on task, encouragement of self-regulated
learning, and the issues surrounding disengagement. It is no sur-
prise that a major factor in students’ success is their self-motivation
and ability to engage with the learning opportunities available to
them [
]. However, students disengage and subsequently drop
out for many reasons, and these are quite multifaceted [
]. On the
other hand, there are students who complete their studies success-
fully after adapting to the diculties by devising new techniques
and increasing their eorts [505].
While much research on student engagement in computing
takes a broader approach, the introductory programming context
has been the focus of a number of studies. Such investigations
often overlap with attitudes as antecedents to motivation (see sec-
tion 5.2.1), tools and teaching approaches as interventions to im-
prove motivation (see sections 6.4 and 6.3), and behaviour, in the
sense of students doing things that improve their level of achieve-
ment and their probability of persisting with their study (see section
5.1.3). However, given the specic interest of the research commu-
nity in engagement, it is worth highlighting key contributions to
student engagement in a separate section here.
Many of the papers published during the period report on empiri-
cal studies focusing on the internal characteristics of students [
and their role in self-regulated learning [
]. Some of these stud-
ies highlight particular constructs deserving of further attention,
for example, achievement goals [734] and perceived instrumental-
ity [
], as well as how these may dier between dierent sub-
populations in some introductory classes [
]. Researchers also
examine a range of motivational sub-factors, for example nding a
weak correlation between introductory programming performance
and social motivation, the desire to please somebody [576].
There are many explorations of the eectiveness of interven-
tions on student engagement. Examples include full course re-
designs [
]; implementing a holistic and better in-
tegrated curriculum [
]; teaching activity design, such as the
use of in-class response systems [
]; strategies to enrich stu-
dent behaviour, such as online journaling [
] and reading in-
terventions [
]; changes to infrastructure, allowing for smaller
classes [
] and closer monitoring of individuals [
]; the appli-
cation of e-learning [
]; ipped classrooms [
]; peer
communication tools such as anonymous chat rooms [
], dis-
cussion forums [
], and Facebook [
]; collaboration, including
pair programming [
], team-based learning [
], and strate-
gies such as think-pair-share [
]; interactive learning experi-
ences [
]; the use of robots [
]; and physical
computing approaches [417,565].
Other papers report attempts to improve student engagement
through changes in the assessment process, introducing assessment
practices and cycles such as feedback-revision-resubmission [
contexts that are meaningful to students [
]; contexts that cor-
respond to the real world [
]; the use of tools and equipment,
such as 3D printers [
] and robot olympics [
], that can grab
attention while facilitating creativity; emphasising the utility of
computing through simulated research [
]; tapping into social
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
computing paradigms earlier in introductory programming [
and game development [130].
Of course, games also feature prominently in the literature on
engagement for their motivational qualities [
]. The use of
games and game-like tools in introductory programming is not
unique to the most recent literature. However, over the period
of our review there has been a clear increase in papers on the
topic of ‘gamication’ (or ‘gameful design’ as some authors pre-
fer [
]), describing how these ideas and techniques can be applied
in the introductory programming context. Though ‘gamication’
was coined as a term in 2002, it has received considerably more
attention in educational spheres since 2010 when Lee Sheldon pub-
lished The Multiplayer Classroom [
] and Jesse Schell endorsed
the approach at the 2010 DICE (Design, Innovate, Communicate,
Entertain) Summit. This has manifested in the introductory pro-
gramming literature through schemes such as Game2Learn [
] and
JustPressPlay [
], as well as the gamied practical activities in
TRAcademic [
]. Yet few papers refer explicitly to any underlying
motivational theory or design frameworks [
], and further
research may be necessary to determine their applicability to im-
proving the design of interventions that engage students to do more
Approaches that promote engagement tend to emphasise active
learning over passive learning, eliminate barriers to practice, play
to students’ intrinsic motivations, explicitly introduce extrinsic
motivations, and strive to resolve ‘hygiene’ problems. Also, some
holistic interventions that approach course design as a system of
interrelated factors tend to report at least some success. However,
the question of how best to facilitate engagement for most students
remains an open one. Several high-quality papers describe suc-
cessful interventions and explain their ecacy using motivational
theories such as three-motivator theory [
], self-determination
theory [
], and control-value theory [
]. Although such the-
ories are discussed elsewhere in the literature, there is little work
that addresses them within the scope of our focus on introductory
programming. Therefore, future work focusing on introductory
programming should strive to apply these theories in explaining
their interventions. It is also somewhat surprising that learning
analytics have not played a more prominent role in this section of
the introductory programming literature. The use of such tools to
measure engagement would help to identify particular ‘hygiene’
factors or problematic topics that tend to disengage students. Such
data could be collected through classroom response systems, vir-
tual learning environments, version control systems, integrated
development environments, and more.
5.2.3 The Student Experience.
When considering the student experience in introductory pro-
gramming courses, we limit this to mean those aspects of study that
are beyond the scope of tools and approaches related to learning.
Thus, papers in this category focus on the experience itself, cover-
ing practices and systems for support, social integration, aect, and
pastoral issues surrounding the transition into higher education.
Many of these issues overlap with retention risk (section 5.3.1) but
have been reported in this category because they focus on experi-
ences or because they describe social or pastoral support structures
rather than focusing on individual characteristics.
With only 18 papers identied by the search, as a topic in and of
itself the experience of learning introductory programming does
not seem to have received focused attention in the period of this
review. A small number of articles explore the inuences of learn-
ing programming in a foreign language [
], but most focus on
interventions to support learners who experience anxiety and frus-
tration in the face of diculty. These include peer-assisted study
schemes [
], mentoring systems [
], and programmes to sup-
port students from underrepresented groups [
]. Most such stud-
ies apply an action-research approach incorporating analysis of data
collected through survey and interview. However, a clear nomolog-
ical network has yet to emerge across such intervention studies,
suggesting that further work is needed to identify and converge
upon key factors associated with the study of the student experi-
Even if not as an explicit goal, work towards such a model does
exist. For example, Barker et al
. [46]
found that factors such as peer
interaction predicted intention to continue studying computing, and
Schoeel et al
. [576]
found correlations between social motivation
and performance. Other researchers such as Haatainen et al
. [234]
considered social barriers such as whether students felt comfortable
requesting help. Indeed, many such social integration challenges
have been identied in the literature. However, rather than a sole
focus on predictors of achievement or retention, the identication
of a holistic set of factors would support work into interventions.
An aspect of the student experience which has received much
attention is aect, in the sense of emotions. Notable qualitative
studies [
] analyse the emotional experiences and ‘tolls’
that programming students tend to encounter as they complete
introductory programming tasks. There are interventions to ex-
plore how the modality of learning programming can inuence the
emotions experienced by students [
]. Nolan and Bergin [
systematically reviewed many such papers in 2016, focusing on
programming anxiety. They concluded that although much work
promotes an awareness of student anxieties, there is a need for
greater focus and granularity by leveraging tools that can mea-
sure a student’s anxiety during particular programming tasks. A
number of tools have been proposed and developed to aid this
research [84,235].
Our search found only a small amount of work on the student
experience in the explicit context of introductory programming.
We are aware that more general work has been carried out on this
topic, and we feel that such work could have considerable impact
on concerns specic to introductory programming, so the area re-
mains a fertile eld for further research. In addition to receiving
more attention, there is also the need to identify key facets and
drivers of the student experience. The methods of phenomenogra-
phy and grounded theory are well suited to the identication of
new factors [
]. Researchers should also consider identifying and
drawing together appropriate theories, models, and frameworks
from other elds to elicit potential factors [
]. The development of
new tools to measure experiential eects would also support work
into the identication of particular tasks that heavily inuence the
student experience, and would help to determine the eectiveness
of particular interventions.
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
5.3 Student Subgroups
In this section we report on two commonly studied subgroups:
students who are at risk of failure, and students in underrepresented
groups. Figure 6illustrates the growth of papers focusing on these
Figure 6: Number of publications per year focusing on un-
derrepresented groups or students at risk
5.3.1 Students at Risk.
The early 2000s was a time of low enrolments in computing
degrees in many parts of the world. The low enrolments, along with
a perception that an atypically high percentage of students were
not successfully completing introductory programming courses,
resulted in a developing interest in identifying students who are
at risk of failing the introductory programming course. However,
in the scope of our own study the majority of papers in this area
have appeared in the past four years. Many of the recent papers
focus on automated methods for identifying students at risk as the
enrolment in introductory computing courses surges.
In one of the earliest papers in our review period [
], students
with little or no programming background were encouraged to
take a preliminary Alice-based course in which they would rst
meet the fundamental programming concepts that they would later
encounter in the ‘normal’ introductory programming course. Stu-
dents in this treatment group ultimately performed substantially
better in introductory programming, and showed a better reten-
tion rate, than the students with a similar background who went
directly into the later course. Although not statistically signicant,
the students who began in the Alice course also displayed higher
self-condence. Other researchers also found low condence or self-
ecacy to be a factor in students who were at risk in introductory
programming [220,694,716].
Several groups of researchers developed automated methods for
identifying at-risk students early in the semester. Many of these
techniques are based on automatically monitoring student activity
as the students develop code [
]. Mea-
surements are made of various attributes of the interactions with
the development environment, such as numbers of errors and times
between compilations. Some researchers make use of data-mining
techniques [
] or predictive modelling [
]. A consistent ratio-
nale for automated investigations of programmer behaviour is their
ease of use compared to test-based methods that attempt to identify
relevant demographic, personality, cognitive, or academic attributes
of the students. A comparative study of behaviour-based and test-
based attributes showed that the former are better predictors of
programmer success [
]. The only test-based attribute that was
a good predictor was self-ecacy.
Estey et al
. [196]
took a somewhat dierent approach by ex-
amining student behaviour using BitFit, a tool for practising pro-
gramming that checks solutions and provides hints. They found
that students who frequently use hints without developing and
submitting code tend not to succeed. Another approach for identi-
fying at-risk students relies on the use of responses to clicker-based
questions in a classroom environment [374,520,521].
Several researchers have identied behaviours that can be ob-
served physically in a lab environment to be indicators of future
poor performance. Examples include being bored or confused in
lab [
], stopping working, using code samples verbatim that may
not be relevant, and making frantic changes [
]. Badri et al
. [40]
developed a web-based IDE with a variety of dashboards and visu-
alisations that allow an instructor to closely monitor and to some
extent control student progress. Haden et al
. [235]
measured stu-
dents’ aective state after lab and found that students who found a
problem both dicult and boring, or who recognised material as
being familiar but did not see a plan to solve the problem, were at
Nearly all of these approaches have shown some success at
identifying at-risk students early in the semester. Some researchers
have identied concepts that are covered in the rst few weeks
of the semester and continue to trouble students on exams and in
assignments throughout the semester. This underscores the need
for students to learn these early fundamental concepts in order to
subsequently succeed. A few papers discuss interventions that have
had a positive impact in this regard. Wood et al
. [716]
found that
when students were matched in pair-programming teams based
on their condence levels, the matched pairs were more likely to
successfully complete assignments. The eect was strongest for the
least condent students. Whittinghill et al
. [706]
found that when
the course was directed at the inexperienced student, those with
little interest in programming at the beginning of the semester had
a similar level of interest to the rest of the class at the end of the
On the other hand, after encouraging students who performed
poorly on an early exam to attend voluntary tutorial sessions con-
ducted by undergraduates, Punch et al
. [528]
note that few students
attended and the intervention had little eect on the nal course
outcomes. Similarly, Hare
reports that the use of peer mentors
did not have the anticipated eect of increasing student success.
Further investigation revealed that many of the students had com-
peting time constraints, such as jobs, which prevented them from
taking advantage of the peer mentoring opportunities.
The research in this area indicates that early success in an intro-
ductory programming course is important for a student’s ultimate
success. Many researchers have shown that students having dif-
culty can be identied early in the semester. A critical piece of
information that was not discussed in many of these papers is what
specic concepts were covered in the early weeks of the courses
being investigated. Future work could focus on identifying if there
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
are specic concepts that confuse students early, and on meth-
ods or techniques for presenting the material to enhance student
5.3.2 Underrepresented Groups.
While many STEM elds have seen increasing enrolments by
women and members of underrepresented minorities over the past
few decades, female enrolments in computing degrees in the US
have stayed low since declining in the mid-1980s. At the same time,
non-white, non-Asian, male populations have traditionally been
poorly represented in computing degree programs. Much of the
research examining issues surrounding the lack of diversity in com-
puting has occurred during the current decade, with the initial focus
primarily on gender diversity. As the gateway course, introductory
programming can play a critical role in attracting students to a
computing major or degree and encouraging them to persist with
that study; alternatively, it can discourage students from pursuing
study in computing. A key factor that has emerged through vari-
ous studies is the importance of self-ecacy. Multiple studies have
found that females have lower self-ecacy when enrolled in an
introductory programming course [
] and that
this can aect performance and persistence.
Studies have also shown that students’ self-ecacy can be inu-
enced by the pedagogy employed in introductory programming. An
early paper by Dorn and Sanders
describes an environment
for learning introductory programming concepts such as control
structures, methods, and objects. When the environment was used
for four weeks at the beginning of an introductory programming
course, students indicated that it increased their comfort and con-
dence in learning to program, with the eect more pronounced
for females and for males from underrepresented groups. Similarly,
Beck et al. [55] found that students given cooperative learning ex-
ercises performed signicantly better than a control group, with
females and males from underrepresented groups appearing to ben-
et more. Newhall et al
. [471]
report on a modied introductory
programming course with a peer mentoring program that after sev-
eral years resulted in the demographics of the course more closely
matching the overall college demographics.
A study of students in an introductory programming course
at a large Midwestern US university [
] used measures of self-
regulated learning. As in other studies, self-ecacy was found to
have the strongest direct eect on performance. Interestingly, male
and female students corrected the accuracy of their self-evaluation
at dierent rates. Measures of self-ecacy were taken at three
points over the semester. For females the correlation between self-
ecacy and end-of-term performance increased signicantly be-
tween the rst and second measures of self-ecacy, whereas males
showed a statistically signicant increase in correlation between
the second and third measures.
The issue of lower self-ecacy in females is not limited to the
United States. A study of students in introductory programming
across multiple institutions in Ireland and an institution in Den-
mark showed signicant gender dierences [
]. Although women
indicated a higher previous achievement in mathematics, female
students rated their programming self-ecacy negatively while
males rated theirs positively and predicted that they would perform
better. Males did perform better at programming early in the course,
but the nal pass rate for females was signicantly higher than that
for males.
Another factor that inuences continuation from introductory
programming to the subsequent course is a student’s intended ma-
jor (in institutions where students choose majors after beginning
their degrees). Data from nine US institutions on the demograph-
ics of majors and non-majors in CS1 [
] indicated that women
make up a greater percentage of non-majors and tend to have less
programming experience than men. A multi-year study at a single
university [
] found that of the students enrolled in CS1, a lower
percentage of the women intended to major in computer science.
Male students showed a signicant correlation between grades and
continuing to CS2, but there was no such correlation for female
students. Among self-taught students and students with no prior
background, males were more likely than females to persist to CS2.
The research to date shows convincingly that developing self-
ecacy is a key factor in the persistence and success of stu-
dents from underrepresented groups. While some approaches show
promise, additional research on pedagogical techniques to foster
the development of self-ecacy would provide better guidance to
those involved in teaching introductory programming.
Table 3shows that papers on teaching fall into ve broad categories:
theories, orientation, delivery, tools, and infrastructure. We found a
small number of papers focusing on theories of teaching and learn-
ing, including those related to Bloom’s taxonomy [
] and cognitive
load theory [
]. The orientation category includes papers that
describe the overall approach to the structure of an entire course,
such as using a ipped-learning approach. The category of deliv-
ery includes papers describing techniques or activities that could
enhance learning by improving the way the content is delivered by
teachers and experienced by students. The majority of papers in
the tools section focus on tools that support teaching and learning.
Finally, papers classied as infrastructure focus on aspects such as
the physical layout of laboratories, networks and other technology,
teaching assistants, and other support mechanisms.
Table 3: Classication of papers focused on teaching
Category N Description
Theories 19 Models, taxonomies and theories
Orientation 53 Overall course structure
Delivery 156 Techniques/activities used to teach
Tools 254 Tools that support teaching/learning
Infrastructure 15 Institutional/environmental support
6.1 Theories of Learning
As with theoretical perspectives of students (section 5.1.1), there
are relatively few papers on theories of learning, with no obvious
trends across the period of our review.
Pears et al
. [501]
observed that papers in computing education
are generally system-oriented and rarely build on a theoretical
foundation. Nine years later Malmi et al
. [409]
found that 51% of the
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
papers that they examined explicitly applied at least one theory in
their work. However, only 19 of the papers that we examined were
coded as theory-related papers, and the majority of these papers
deal with the notion of learning styles [
], which have
been largely discredited as “pseudoscience, myths, and outright
lies” [332].
The papers identied in our literature review apply theories
and frameworks related to students’ progress from exploration,
through tinkering and constructive learning theories, to knowledge
acquisition [
]. Bloom’s taxonomy, which has been widely
referenced by researchers as a benchmark for assessment of stu-
dents’ learning, has seen some use in introductory programming
research [
]. Self-regulated learning theory is another body
of knowledge that has been used to identify guidelines for when
and how interventions should occur [170].
Soloway, a pioneer in computing education research, viewed stu-
dent learning through the lens of cognitive theory, which sees basic
knowledge as chunks of information that a student or programmer
can recall and use in a constructive way. This is directly connected
with another theory used in computing education research, cogni-
tive load theory, which posits that learning degrades as the learner
is required to remember more items than the capacity of their work-
ing memory [
]. In introductory programming, which is known
to be dicult to learn, it is critical to assist students to develop the
necessary basic knowledge, but also to consider their cognitive load
and their capacity to absorb that knowledge.
For introductory programming in the object-oriented paradigm,
recent work has focused on the development and empirical valida-
tion of competency models [
], which highlight the structural
knowledge of programming novices and the potential dimensions
of such knowledge [458].
Some research initiatives focus on building theories of how learn-
ers acquire programming knowledge [
]. However, the dominant
approach to knowledge construction in computing education re-
search is to design innovative tools and teaching approaches and
evaluate them empirically through possibly rigorous studies. This
approach constructs knowledge related to particular examples and
contexts, but contributes little to a more generalised understanding
of computing.
Nevertheless, studies of local interventions do create opportu-
nities for constructing more abstract knowledge. Thus, future re-
search in introductory programming should consider theoretical
implications by going beyond case studies and specic tools. The
role of theory should be not to replace the case studies and empiri-
cal analyses but to annotate them and construct knowledge that is
more general. This could be achieved by grounding or connecting
future research to learning theories and documenting potential
theoretical implications; by extending learning theories to address
the particularities of introductory programming research; or by
performing meta-analyses to construct bodies of knowledge that
are more abstracted.
6.2 Course Orientation
In this section we look at broader approaches related to the overall
course structure, the ‘orientation’ of the course, that have been
adapted for use in introductory programming courses. We have
identied several broad categories in the literature: self-paced learn-
ing, exploratory learning, inverted classrooms, and online courses.
The growing interest in student-directed learning and the rise in
popularity of online learning have resulted in relatively steady
growth in this area, as indicated in Figure 7, with over half of the
papers being published in the past four years.
Figure 7: Papers classied as being about course orientation
have increased over the period of study
6.2.1 Self-Paced.
Mastery learning is an approach in which students are expected
to demonstrate that they have reached an appropriate level of mas-
tery in a given topic before they continue to more advanced material
in that topic. It is widely believed that learning to program requires
students to understand basic elements of syntax before learning
more complex elements [
], and researchers have expressed the
need for assessment practices to allow students to demonstrate
knowledge of individual components before they solve problems
involving multiple concepts [
]. Despite the apparent align-
ment between the process of learning to program and the mastery
learning approach [
], there have been few reported examples
of mastery learning in the introductory programming classroom.
This is perhaps explained by the diculty of resolving the tension
between the constraints of traditional semester-based delivery and
the freedom of self-paced learning.
There is some evidence that mastery learning is of more benet
to weaker students than to stronger students because it helps to
‘level the playing eld’, and that those benets continue into subse-
quent courses [
]. In an attempt to ensure a minimum standard
of expertise in team projects, Jazayeri
required students to
demonstrate mastery of basic programming skills before they were
permitted to participate in a group project in the second half of an
introductory programming course. This approach meant that fewer
teams were burdened by students with inadequate skills.
Purao et al
. [529]
report on the iterative renement over ve
years of a self-paced learning approach based on Keller’s Personal-
ized System of Instruction. Although several issues were encoun-
tered, their most signicant concern was student procrastination.
They advocate that teaching using self-paced learning should attend
to student motivation and ensure that students continue to study at
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
the pace required to complete the course material. They also iden-
tify a trade-o between allowing students to work at their own pace
and the benets of working with peers. In similar implementations,
students preferred an environment that accommodated a variety of
ability levels by having exercises that they could work through by
themselves, but supported by scheduled laboratory sessions with
an instructor who could provide targeted help with the exercises
for that week [672].
Although this approach appears to show potential for teaching
introductory programming, it is dicult to accommodate self-paced
learning within the institutional course structures adopted by most
universities. Where it is possible to use more self-paced learning
approaches in online environments and within inverted classrooms
that accommodate dierentiated learning experiences, further re-
search in this area has the opportunity to make a substantial con-
6.2.2 Exploratory.
Exploratory approaches encourage students to be creative and
to drive the learning process. Problem-based learning uses complex
open-ended questions to encourage students to develop problem-
solving strategies. The problems may be set by the teachers or
may be developed by students, as is often the case in project-based
courses. Studio learning, by contrast, emphasises creativity and
communication by asking students to present their solutions to
Problem-based learning. Although several studies using problem-
based learning have reported increased motivation and social
interactivity [
], and asserted the success of the ap-
proach [
], there is little consistent evidence of improvement
in learning of content knowledge. Some case studies report learn-
ing gains in problem decomposition and program-testing strat-
egy [
] and overall improvement in pass rates [
] or in
specic exam questions [
]. In other cases no dierence in learn-
ing was observed [
], students without clear guidance struggled
to solve the problems eectively [
], and concerns were raised
about the level of theoretical insight acquired compared to those
experiencing traditional lectures [397].
Studio learning. Studio-based learning takes inspiration from
more creative disciplines such as architecture and ne art to embed
a focus on creative design and public critique within an introduc-
tory programming course. This approach emphasises evaluation
and communication in the design of a solution, and appears to be
more eective when supported by a visualisation tool [
]. Stu-
dents using a studio-based approach had higher self-ecacy than
those in a more traditional classroom environment and were more
favourably disposed towards peer learning, but both environments
resulted in similar levels of content knowledge in nal exams [
A three-year case study of studio-based learning reported that stu-
dents enjoyed the social aspect of studio learning and found its
open-ended nature highly engaging [541].
While instructors at a number of institutions have implemented
and reported on one of these exploratory approaches, and some
have conducted research on the outcomes, their ecacy will need
to be evaluated far more widely and rigorously before there is any
prospect that they will be widely adopted.
6.2.3 Inverted Classrooms.
Several variations of course design replace traditional lectures
with other classtime activities such as practice exercises, collab-
orative problem solving, and teamwork. Such courses are often
described as ‘inverted’ or ‘ipped’ classrooms.
The initial development costs for a well designed and supported
inverted classroom can be quite high. One study estimates having
spent approximately 600 hours to record the videos that students
used to prepare for classes, and another 130 hours to prepare 32
worksheets for students to use in class [108].
The high cost of course development may be worthwhile, with
reports of high levels of engagement [
], a greater sense of com-
munity [
], improved retention [
], and improved perfor-
mance in exams [191,192,361,363].
Flipped classrooms typically have several graded components
to help ensure that students complete all of the required work.
One study observed that the ipped approach required students to
devote more time to the course. Students completed more practice
exercises and received more feedback in the ipped approach than
in a more traditional classroom, which may explain the improved
performance [191].
However, the additional work imposed on students may not al-
ways provide benets. Quizzes used to determine whether students
had completed the preparation before class meetings were reported
to have no benet [
], and well structured online courses may re-
sult in performance that is equivalent to the ipped approach [
Cognitive apprenticeship. Cognitive apprenticeship is informed
by Bandura’s social learning theory [
], which recognises the
importance of learning through modelling the behaviour of oth-
ers. Demonstrations by experts [
], worked examples [
], and
individualised feedback (coaching) are characteristic of cognitive
apprenticeship. The use of cognitive apprenticeship in comput-
ing classrooms has been associated with reduction in lectures and
increased student agency, similarly to the ipped classroom model.
A variation of cognitive apprenticeship called extreme appren-
ticeship implemented at the University of Helsinki reduced lec-
tures to a minimum and emphasised lab-based exercises with high
levels of feedback [
]. This approach resulted in signicant im-
provements in student understanding, and higher performance in
subsequent courses [
]. Jin and Corbett
showed that
when worksheets that model expert problem solving are combined
with an intelligent tutoring system that provides feedback on ex-
ercises, learning improves signicantly compared with a standard
presentation of material.
The cognitive apprenticeship delivery approach appears success-
ful, but studies investigating the approach in a broader range of
contexts are required to determine the extent to which these case
studies can be generalised.
6.2.4 Online Courses.
Notwithstanding the recent growth of MOOCs, we found few
papers dealing explicitly with online courses.
When a small private online course (SPOC) was introduced to
complement the standard course content as an optional support
structure, students liked the course, and the resources were well
used without a negative impact on lecture attendance. The use of
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
badges to motivate students to complete online quizzes, and imme-
diate feedback for programming exercises through automated as-
sessment tools, were eective approaches to engage students [
The completion of practice exercises in an online textbook has
been highly correlated with improved exam performance [
]. In
a comparison between an online course and a ipped classroom
using similar resources, completion of ungraded practice exercises
was correlated with exam performance, but fewer students in the
online course completed the exercises compared to those attending
in person [
]. Given the choice of enrolling in a face-to-face or
an online version of an introductory programming course, more
than 80% of students chose the face-to-face course, based primarily
on a preference for attending lectures, rather than other reasons
such as prior experience or perceived workload [271].
6.3 Teaching Delivery
This section focuses on how teachers choose to deliver the cur-
riculum. This includes the overall context in which introductory
programming is taught, and the specic activities used to deliver
the content. Figure 8shows that publications focusing on delivery
of teaching do not appear to be growing in number.
Table 4: Classication of papers focused on approaches
taken to course delivery
Category N Description
Context 44 The ‘avour’ of a course
Collaboration 53 Collaborative learning approaches
Techniques 59 Approaches, techniques and activities
Figure 8: Number of publications focusing on teaching de-
livery per year
6.3.1 Context.
Course context encompasses the context of the examples, ex-
ercises, and assignments used to embed the content knowledge
of programming, and has been referred to as the ‘avour’ of the
course [
]. The course context is perceived to have an impact
on student perception of the discipline, and to aect student mo-
tivation, attitude, and level of engagement with programming. It
is reported to aect enrolment in computing courses, and could
impact on underrepresented groups [322,437].
Teachers have explored a variety of contexts that connect in-
troductory programming with real-world applications. It has been
argued that the relevance and real-world application of computa-
tional processes are demonstrated eectively by tangible comput-
ing devices such as a Sifteo Cube [
], LEGO MindStorms [
iRobot Creates [
], CSbots [
], and the Raspberry Pi [
]. Al-
though such innovative contexts are seldom evaluated, there is
some evidence that students enjoy using the devices [
], and this
enjoyment may translate to greater participation and completion
rates [364].
Ecology [713], mobile phones [602], networking [467], creative
story-telling [
], real-world data analysis [
], creative art [
mobile apps [
], and cross-disciplinary activities [
] have all
been used to broaden interest in programming. Anecdotally, these
contexts all provide positive experiences for students, but few for-
mal evaluations explore their impact on students’ learning.
Games are widely used to motivate students to learn program-
ming [
], and have the advantage of lending
themselves to more creative open-ended projects. Although stu-
dents prefer to have structured assignments [
], they enjoy the
opportunity to be creative when supported by an appropriate frame-
work [
], and can even be enlisted to develop games for
the learning of programming [
]. However, there is some evi-
dence that the course context is not as important as good structure
and good support in the course [295].
Following the success of media computation for non-
majors [
] there have been several reports of introductory pro-
gramming courses that use image processing as a motivating con-
text [165,421,542,707].
Although there are very few studies that investigate the impact
of context on students, Simon et al
. [617]
found a signicant im-
provement in both retention rate and exam performance when a
traditional introductory programming course was delivered using a
media computation approach. However, a separate study of students
in two dierent sections of a single course showed no dierences
in performance or retention when one section was delivered us-
ing a visual media context and the other using a more traditional
programming context [
]. Comparing students taught in a robot
programming context with those taught in a web programming
context, Scott et al
. [582]
found that the former group spent signi-
cantly more time programming and that their programs showed
more sophistication and functional coherence.
Despite the potential for students to be motivated by varying con-
texts, most publications provide case studies involving the context
of delivery, rather than collecting and reporting data that evalu-
ates the impact of the course on students. Further work to provide
empirical evidence of the impact of course contexts would make a
signicant contribution to the community.
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
6.3.2 Collaboration.
Collaboration has several reported benets in the introductory
classroom environment, including improved motivation, improved
understanding of content knowledge, and the development of soft
skills such as teamwork and communication. In an educational
environment where transferable skills are becoming increasingly
important, collaborative activities oer opportunities to deliver a
wide range of desirable outcomes.
The term ‘contributing student pedagogy’ [
], although not
in widespread use, describes recent approaches in which students
explicitly create material to be used by other students for learn-
ing. Examples include the creation of collaborative textbooks [
sharing solutions [
], generating assessment questions (such as
multiple choice questions) [157], and peer review [238].
Peer Review. Regularly engaging in peer review of programming
code reportedly helps to correct low-level syntax errors, and pro-
motes discussion of higher-level design and implementation is-
sues [
]. Students engaged in reviews appear to be more eective
and more engaged when the reviews are conducted collaboratively
rather than individually [284].
Electronic voting. The use of electronic voting systems, often
described as ‘clickers’, has risen in popularity as a way of mak-
ing lectures more active. Although students report enthusiasm
towards the use of clickers, many students participate only inter-
mittently [
]. Peer instruction, which makes use of clickers as
one component of a more structured pedagogy, has shown positive
results. Students report that they like the approach [
], and it
appears to improve their self-ecacy [
] and retention [
However, peer instruction is not uniformly successful; it relies on
instructors to clearly communicate the purpose of the activities
and to use grading policies that do not penalise incorrect answers
during the peer instruction process; and it should be supported
by other changes to course delivery, such as requiring students to
spend additional time preparing before class [
]. Despite several
positive outcomes resulting from peer instruction, there appears to
be little evidence that it improves performance in nal exams.
Pair programming. Pair programming has been described to be
one of the few educational approaches that we know are eec-
tive [
]. Experiments comparing students engaged in solo pro-
gramming with those engaged in pair programming have shown
that students typically enjoy working in pairs [
although conicts within pairs are frequently noted [
giving rise to several studies investigating how best to pair stu-
dents [533,660,691].
Students working in pairs produce work that is of higher quality
than students working alone [
], but the improved
performance does not extend to individually assessed work [
Pair programming may provide greater support for students
with low academic performance [
], resulting in increased
pass rates [
] and improved retention [
but provides little measurable benet for more experienced stu-
dents [
]. Several studies have also reported that the benets of
pair programming are more pronounced for women [89].
Distributed pair programming appears to work similarly to col-
located pair programming, with students typically enjoying the
process, producing code with higher quality and fewer defects, and
being more likely to complete the course — but without evidence
of improved performance in individual assessments [241,725].
Other paired activities. An experiment comparing the use of
think-pair-share (TPS) with a traditional lecture format in a sin-
gle class showed that students learning with TPS believed that it
helped them to learn. On a subsequent quiz the TPS group scored
signicantly better than the control group [347].
A study investigating the eects of peer tutoring on student con-
dence, attitude, and performance found no signicant dierences
between students engaged in peer tutoring and those experienc-
ing standard delivery of material [
]. Another study using the
supplemental instruction (SI) approach, in which ‘model’ students
guide additional learning sessions for their peers, reported no im-
provement in student performance [700].
Group activities. Cooperative learning activities that assign spe-
cic roles to each student in a team are reported to improve moti-
vation, condence, problem-solving ability [
], individual perfor-
mance on exams [
], and performance in subsequent courses [
Although collaborative activities are easier to implement face to
face, the benets of working collaboratively with peers extend to dis-
tributed environments when supported by collaborative tools [
In a virtual classroom environment, Bower [
] found that peer
review, program modication, and debugging tasks resulted in a
high degree of engagement and interaction among peers. The cre-
ation of programs was also eective, but more sensitive to the task
design and ability of the students.
Although collaborative learning is often used to improve content
knowledge, there has been little explicit evaluation of soft skills
developed through collaborative activities, skills such as commu-
nication, cooperation, commitment, work ethic and adaptability.
While such skills may be time-consuming, or dicult, for teachers
to eectively evaluate, peer assessment has been used successfully
for this purpose [432,433].
Summary. Evidence in the literature suggests that students enjoy
the social interaction resulting from collaborative activities, and
that working collaboratively has several benets such as improved
engagement, condence, retention, and performance. Although
pair programming is the most widely studied form of collaborative
activity, increased individual performance appears to result more
reliably from other forms. Instructors and researchers would benet
from further detailed case studies of eective collaborative activities
for introductory programming.
6.3.3 Pedagogical Techniques.
In this section we survey a broad range of papers that discuss
how instructors choose to teach specic topics in the curriculum
and the types of activities that they use to teach these topics. We
do not consider papers on methods that aect the structure of the
whole course, which are more likely to be discussed in section 7. We
also exclude papers on contextualisation and collaborative activities,
which are discussed in sections 6.3.1 and 6.3.2.
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
Topic ordering. One of the basic pedagogical questions faced by
instructors is the order in which topics should be introduced. Bruce
summarises a discussion that took place on the SIGCSE mail-
ing list on whether to introduce procedural programming concepts
before object-oriented programming concepts. While this relates
to the bigger controversy on which programming paradigm to use
in introductory programming, the discussion on the mailing list
showed how much disagreement there is on topic ordering even
among instructors who agree on introducing OO programming
concepts in introductory programming. Although that discussion
took place nearly 15 years ago, the controversy is still not settled.
We found two papers that argued for specic topic orderings in in-
troductory programming, but without providing empirical evidence.
discusses how functions can be taught rst when teach-
ing Java and how this leverages what students already know about
functions in mathematics. Bruce et al
. [95]
argue for introducing
structural recursion before loops and arrays in an objects-rst Java
course, suggesting that this ordering provides more opportunity for
reinforcing object-oriented concepts before students are exposed to
concepts that can be dealt with in a non-object-oriented way. Paul
discusses the pedagogical implications of topic ordering in
introductory programming courses.
Testing. Several works advocate that testing be taught early in
introductory programming, with the expected benets of promot-
ing reective programming instead of trial-and-error program-
ming [
], improving understanding of programming con-
cepts [
], getting used to industry practices and tools [
and boosting student condence [
]. However, empirical evi-
dence for these potential benets in introductory programming is
still limited. Papers describe the introduction of test-driven devel-
opment (TDD) to introductory programming courses, but without
providing any empirical evaluation [91,697].
Edwards and Pérez-Quiñones
report that the Web-CAT
tool has been used in several institutions for several years to eval-
uate the quality of student-written test cases in introductory and
subsequent programming courses, but their evidence for the ef-
cacy of TDD was gathered in non-introductory courses. Some
evidence is provided for success in teaching novices how to write
high quality test cases [
], for success in using TDD for teaching
arrays [
], and for the usefulness of teaching mutation testing in
a Pascal programming course [483].
Neto et al
. [469]
evaluated a teaching approach that requires
students to develop test cases in a table-like manner, and found
that they exhibited more thoughtful programming behaviour than
when provided with feedback from an auto-grader.
On the other hand, some papers report that teaching TDD in in-
troductory programming is an additional challenge, because adding
a testing requirement can be overwhelming for novices [
], can
be perceived by students as irrelevant or useless to their learning
of programming [
], and can be at the expense of other material
typically taught in introductory programming [
]. Some of the
papers that argue for the early introduction of testing [
have the students use test cases written by instructors, which is
pedagogically dierent from the TDD expectation that students
will write their own test cases.
Exercises. The relatively large number of drill-and-practice tools
mentioned in section 6.4.4 gives a hint of the central role that exer-
cises play in the teaching style of many introductory programming
instructors. Some instructors [
] take exercises a step further
by designing their course in an exercise-intensive manner similar
to the way that some mathematics courses are taught. The exer-
cises used are generally either conventional coding exercises or
multiple-choice concept questions, but other types of activity have
also been proposed. Miller et al
. [445]
propose using a type of col-
laborative exercise based on creative thinking principles. Ginat
and Shmalo
propose an activity type that creates cognitive
conict to directly address erroneous understanding of OOP con-
cepts. Kortsarts and Kempner
present a set of non-intuitive
probability experiments to be programmed by students, arguing
that such exercises engage students and motivate them to learn.
In a randomised experiment, Vihavainen et al
. [681]
found that
adding supporting questions to self-explanation exercises improves
student performance on similar exam questions. Lui et al
. [390]
scribe how student solutions to lab exercises can be shared with the
whole class to be used as worked examples, which could facilitate
the acquisition of alternative schemas for solving problems. Denny
et al
. [157]
showed in a randomised experiment that requiring stu-
dents to create exercise questions in addition to solving exercises
can bring about a greater improvement in their exam performance
than simply solving exercises.
Teaching a process for programming. Several papers propose that
students be taught a process for composing programs. Rubin
advocates live coding as a way to show the students the process
that experts (such as the instructor) go through when they solve
problems and write programs. In an experiment, Rubin found that
students who were taught using live coding achieved higher grades
on programming projects than students who were taught using
static examples that are not modied live in front of the students.
Bennedsen and Caspersen
propose using video recordings
of experts as they program, supplementing live coding with the
exibility of allowing students to watch the process more than once
and of presenting more complex problems than is typically possible
in the limited class time.
On the other hand, Hu et al
. [275
advocate that students
be explicitly taught a detailed programming process rather than
being expected to infer one from examples. In one paper [
] they
propose teaching goals and plans [
] using a visual notation im-
plemented using blocks in the Scratch programming environment.
They also propose a process that students can follow to move from
goals and plans to the nal program, and observed a statistically
signicant improvement in the performance of students on exam
programming questions when taught using this method. In a later
paper [
] they propose an iterative process with feedback for
the intermediate steps, where unmerged plans can be executed
separately before the whole program is built and executed.
Castro and Fisler
captured videos of students as they solved
a programming problem in Racket and analysed how the students
composed their solutions. These students had been taught to pro-
gram using code templates [204], but Castro and Fisler found that
students often used code templates without adjusting them, trying
to decompose the problem on the y around code they had already
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
written instead of carefully decomposing the problem in advance
of writing code. They concluded that students should be explic-
itly taught schemas not just for code writing but also for problem
Leveraging students’ prior knowledge. To study the algorithmic
knowledge that students have prior to learning any programming,
Chen et al
. [123]
gave students a sorting problem on the rst day of
an introductory programming course. Although students showed
no prior understanding of the notion of a data type, they were
able to describe algorithms with sucient detail. Their solutions
made more use of post-test iteration (as in do-until loops) than of
pre-test iteration (as in while loops). This knowledge could be used
by instructors when introducing programming concepts.
Several authors attempt to leverage such prior knowledge. Bla-
describes a vote-counting activity that can be used on
the rst day of an introductory programming course to introduce
concepts about algorithms. Paul
uses the example of a digital
music player to introduce object-oriented modelling and the dif-
ference between objects and references to objects. Gonzalez
describes activities using manipulatives and toys to introduce func-
tions, message passing, memory concepts, and data types. Sanford
et al
. [572]
interviewed ten experienced introductory programming
instructors about their use of metaphors. The study analysed and
categorised twenty dierent metaphors used by these instructors
and found that some instructors used a single metaphor to explain
several concepts while others used several metaphors to explain a
single concept. They concluded that more work needs to be done
to understand the limitations of using metaphors.
Many instructors, regardless of discipline, use analogies and
metaphors as a way of connecting students’ prior understanding
of a familiar topic to a completely new topic. However, there is
little empirical evidence for the eectiveness of using analogies in
introductory programming. Cao et al
. [109]
evaluated experimen-
tally the eectiveness of using analogies when teaching recursion,
events, and multithreading. They measured short-term learning us-
ing clicker questions, long-term learning using exams and quizzes,
and transfer using exam questions that require knowledge to be
applied to new areas. Results showed potential benet from analo-
gies in the short term, but there was no strong evidence for benet
in the long term or in transfer to new areas.
Object-oriented design. Several papers describe techniques for
teaching object-oriented design. Concepts such as abstraction
and program decomposition can be introduced to students us-
ing large programming projects that are broken down to smaller
pieces [
]. Keen and Mammen
discuss a term-long
project in which students are given clear direction on program
decomposition at early milestones and progressively less direction
at later milestones. They found that a cohort taught using this tech-
nique showed less cyclomatic complexity in the code they wrote for
a nal assignment, compared to cohorts in previous years. Santos
describes a tool that allows object-oriented concepts to be
taught using visual contexts such as board and card games. Stu-
dents can manipulate object instances in these contexts and see the
eect visually. Similarly, Montero et al
. [450]
found that students’
understanding of basic object-oriented concepts improved when
they used a visual environment that allows interaction with object
instances and visual observation of the eect.
Visualisation. Beside object-oriented concepts, several papers
report the use of visualisation tools in introductory program-
ming for teaching topics such as recursion [
], sorting algo-
rithms [
], roles of variables [
], and procedural programming
concepts [
]. Lahtinen and Ahoniemi
use a visualisa-
tion tool and an algorithm design activity to introduce students to
computer science on their rst day in the introductory program-
ming course, allowing students with no programming experience
to experiment directly with dierent issues related to algorithms.
Hundhausen and Brown
use a visualisation tool in a studio
teaching method, with students working in pairs to design an algo-
rithm for a problem and construct a visualisation for their algorithm.
The class then discuss the designed algorithms and their visualisa-
tions. The results of actually constructing visualisations rather than
just watching or interacting with them were inconclusive, although
the results for the eect of the overall studio teaching method were
positive. On the other hand, AlZoubi et al
. [22]
evaluated a hybrid
method for teaching recursion, where students perform dierent
types of activity using a visualisation tool. They found that the num-
ber of students’ activities in constructing visualisations correlated
more strongly with learning gains than the number of activities in
watching an animation or answering questions about it. Evaluating
the use of visualisation tools in the classroom, Lattu et al
. [362]
served 60 hours of assignment sessions and noted how students and
instructors explained code to one another. They found that students
and instructors very rarely used the visualisation tool that was at
their disposal, instead using visual notation on the blackboard to
trace program execution and show the structure of the manipulated
data. They also used multiple computer windows simultaneously
to talk about related pieces of code and to show how code relates
to what happens at runtime. The study recommended that such
observations should be taken into consideration when designing
visualisation tools that are meant to be used in an explanatory
learning setting as opposed to an exploratory one.
Video. Instructors have reported using multimedia [
] and
videos [
] in introductory programming. Reasons
for using videos included providing exibility for students who
cannot attend class in person [
], explaining extra material that
is dicult to cover during class time [
], and dedicating class
time for hands-on activities instead of lecturing [
]. Murphy and
note that instructors should not only care about the
quality of their videos, but should also think carefully about how
the videos are going to be used in the course. Their experience was
disappointing, as a relatively small number of students used the
videos which they spent a lot of time trying to perfect.
6.4 Tools
A variety of tools have been developed to support introductory
programming. The majority of papers present new tools, but some
report on evaluations of existing tools or comparative reviews of
multiple tools. Table 5shows the category, number of papers, and
brief description of the papers in each of the following subsections.
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
Figure 9shows the number of papers per year about tools, clearly
showing a steady increase in the number of these papers.
Figure 9: Number of publications per year focusing on teach-
ing tools
Table 5: Classication of papers focused on tools
Category N Description
Reviews and re-
Analysis of existing literature or
broad research implications
and editors
Programming environments, edi-
tors, IDEs
Libraries, APIs,
and extensions
Libraries, APIs, programming lan-
guage extensions
51 Tools to practice programming
Debugging and
Debugging, programming errors,
compiler error messages
Design 13
Tools to practice problem-solving
Visualisation 24
Program and algorithm visualisa-
tion systems
Collaboration 13
Tools to help students collaborate
Games 23
Tools for developing games,
games as learning objects
Evaluation 22 Evaluation of tools
Progress 21
Tools that monitor student
6.4.1 Reviews and Research Considerations.
The past 13 years have seen a number of reviews of program-
ming tools, including three on programming languages and en-
vironments, as well as reviews of support tools, robots as tools,
owchart tools, visualisation tools, and learning tools. We list these
below, then briey discuss tools that have appeared in course cen-
sus studies and tools that can support the replicability of research
Environments: Rongas et al
. [560]
propose a classication
of ‘learning tools and environments’ for introductory pro-
gramming students, broken down into IDEs, visualisation,
virtual learning environments, and systems for submitting,
managing, and testing of exercises. They include free and
commercial tools, and provide guidelines for tool selection.
Languages and environments: Kelleher and Pausch
present a taxonomy of languages and environments designed
to make programming more accessible to novice program-
mers. The systems are organised by their primary goal (teach-
ing programming or using programming to empower users)
and then by the approach used in each system.
Environments: Gross and Powers
summarise a represen-
tative collection of the assessments of novice programming
environments, present a rubric for evaluating the quality of
such assessments, and demonstrate the application of their
rubric to the summarised works with the intent of helping
inform future eorts in assessing novice programming envi-
Support tools: Coull and Duncan
summarise the prob-
lems associated with learning and teaching introductory
programming and discuss various support tools and tech-
niques, indicating which of them have been evaluated. They
derive ten requirements that a support tool should have.
Robots as tools: Major et al
. [405]
conducted a systematic
literature review on the use of robots as tools in introductory
programming. They found that 75% of papers report robots
to be an eective teaching tool. They note that most of these
papers focus on the use of physical robots, and note the
need for further research to assess the eectiveness of using
simulated robots.
Flowchart tools: Hooshyar et al
. [266]
surveyed owchart-
based programming environments aimed at novice program-
mers, identifying the basic structure employed in the major-
ity of existing methods as well as key strengths and short-
Visualisation tools: Sorva and Seppälä
carried out a
comprehensive review of visualisation tools (64 pages and
200+ references), which we discuss further in section 6.4.7.
Learning tools: Saito et al
. [567]
present a survey and taxon-
omy of 43 tools designed for use in introductory program-
A 2003 census of introductory programming courses in Aus-
tralia and New Zealand reports on the programming languages and
environments used in 85 courses [
]. A similar survey of 35 insti-
tutions conducted in 2016 [
] showed a clear shift in language
use from Java to Python.
Jadud and Henriksen [
] suggest that more computing educa-
tion research studies should be easily replicable and that the tools
used for data collection are often too specialised, unstable, or simply
unavailable for use in replication studies. They present two tools to
aid in the replication and extension of existing research regarding
novice programmers, in addition to supporting new research. The
rst tool, specic to the BlueJ pedagogic programming environ-
ment, provides a starting point for replicating or extending existing
studies of novice programmers learning Java. The second tool is a
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
portable standalone web server with a language-agnostic interface
for storing data. The distinguishing feature of this server is that
it is schema-free, meaning that it can easily support a wide range
of data collection projects simultaneously with no reconguration
Notable gaps in the recent literature in terms of reviews of teach-
ing tools include the areas of debugging and errors, tools to practice
programming, design, student collaboration, games, evaluation of
tools, student progress tools, and language extensions, APIs, and
libraries. Among these areas there does appear to be emerging in-
terest in debugging and errors, student collaboration, visualisation
tools, and games.
6.4.2 Programming Environments and Editors.
The choice of programming environment for introductory pro-
gramming courses is nearly as contentious as the choice of pro-
gramming language. Some instructors choose a basic combination
of command-line compiler such as javac or gcc coupled with a pop-
ular text editor such as Notepad++, while at the other extreme some
choose popular industry-grade IDEs such as Eclipse. Between these
extremes, many tools and environments have been developed specif-
ically for introductory programming students, often with the aim
of minimising features that are deemed unnecessary for beginners.
In this section we discuss such pedagogical environments before
discussing more general environments for mainstream languages,
followed by work done in evaluating these environments. We con-
clude with a list of environments for beginning programmers, or-
ganised by language. We note that the landscape of environments
and editors is constantly evolving, often driven by new platforms
such as mobile phones, which hold promise for the creation of new
editors, environments, and languages such as TouchDevelop [
Pedagogical environments. Pedagogical environments such as the
Penumbra Eclipse plug-in [
] dispense with features not neces-
sary for beginners. Others focus on being specically suited for a
particular approach or pedagogy; for example, BlueJ [
] is widely
used for teaching Java in an objects-rst approach. Others, such as
the DrJava Eclipse Plug-in [
] and COALA [
], seek to provide
suitable pedagogic environments while leveraging the power of
industry-standard IDEs. Some, such as DrHJ [
], extend other ped-
agogical environments, in this case DrJava, to support specialised
languages such as Habanero Java, a parallel programming language
based on Java. The fact that many languages are tied to specic ed-
itors, environments, and/or contexts has been considered problem-
atic by some, leading to tools that try to avoid these dependencies.
Two such examples are Calico [
], a framework/environment that
supports a variety of languages, pedagogical contexts, and physical
devices, and Jeroo [
], a language, IDE, and simulator, inspired
by Karel the Robot [
] and its descendants, that seeks to provide a
smoother transition to Java or C++.
General environments for mainstream languages. Some systems seek
to be as general, all-encompassing, and accessible as possible. A web-
based system by Ng et al
. [473]
provides an all-encompassing online
programming learning environment incorporating editing, compil-
ing, debugging, and testing, as well as features supporting educator
and student participation in learning activities such as coursework,
feedback, and student progress monitoring. CodeLab [
], a web-
based interactive programming exercise system for introductory
programming classes, accommodates Python, Java, C++, C, C#,
JavaScript, VB, and SQL. Rößling
presents a family of tools to
support learning programming and a plan to integrate these tools
into the Moodle learning management system.
Evaluations of environments. In addition to presenting new environ-
ments, many authors have evaluated their eectiveness. Comparing
three popular environments for Python, Dillon et al. found that
students struggled with low-assistive environments regardless of ex-
perience and condence, that students were able to use moderately-
assistive environments more eectively [
], and that the diculty
of transitioning between such environments may not be symmetri-
cal [
]. Vihavainen et al
. [680]
studied novices writing their rst
lines of code in an industry-strength Java IDE (NetBeans) and found
no evidence that learning to use the programming environment
itself is hard. Despite this evidence, novice-friendly programming
environments remain popular with instructors and students alike.
McKay and Kölling
extended their previous cognitive mod-
elling work by developing a new novice programming editor that is
specically designed to reduce intrusive bottlenecks in interaction
design in some novice programming languages.
Here are some of the environments and editors that have been
reported, categorised by the languages they support:
C: LearnCS! [375], unnamed systems [168,452]
C++: CLIP [392]
Habanero Java: DrHJ [498]
Haskell: Helium [254]
Java: ALE [
] (Java-based platform for developing 2-D An-
droid games), BlueJ [
], COALA [
], CodeMage [
Decaf [
], DrJava Eclipse Plug-in [
], ELP [
Gild [643], JGrasp [443], Jigsaw [100], Penumbra [457]
Jeroo: Jeroo [571]
Karel++: objectKarel [718]
Pascal: VIPER [3]
Python: CodeSkulptor [653], PyBlocks [516], Pythy [188]
Multiple Languages: AgentSheets [
], Calico [
], Code-
Lab [50], CloudCoder [491]
The vast array of environments used with introductory program-
ming courses includes command-line compilers, industry-grade
IDEs, and pedagogical environments specically intended for learn-
ing. Some of the pedagogical environments work with mainstream
languages while others are designed to work with teaching lan-
guages. Although some work has been done to evaluate these envi-
ronments, it is still far from clear whether introductory program-
ming students are better o with industry-grade environments
working with mainstream languages, pedagogical environments
working with mainstream languages, or pedagogical environments
working with teaching languages.
With language and environment choice one of the most impor-
tant decisions in teaching novice students, reliable and comprehen-
sive evaluations in this area would be extremely valuable.
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
6.4.3 Libraries, APIs, and Language Extensions.
Along with the numerous and diverse programming envi-
ronments used in introductory programming (discussed in sec-
tion 6.4.2), a number of libraries, APIs, and language extensions
have also been developed. These are important as they often seek
to provide interaction and increase engagement with novice pro-
gramming students. Papers in this group report on the use of
robots [
], graphics [
], and multime-
dia [447,489].
A number of papers focus on bringing other engaging aspects
to the novice programmer. RacketUI [
] allows students learning
with the DrRacket pedagogical programming language to develop
web forms without the need for a graphics library. Hamid [
introduces a framework to facilitate the introduction of large,
real-time, online data sources into introductory programming.
Squint [
] is a Java library developed to support the use of event-
driven programming and network applications in programming
examples for introductory programming courses. Emerson [
a plugin for the Sirikata open source virtual world platform, is de-
signed for scripting objects in user-extensible virtual worlds such
as Second Life, Active Worlds, and Open Wonderland, with the
primary goal of making it easy for novice programmers to write
and deploy interesting applications.
Most of the papers in this category seek to increase engage-
ment with novices by providing easy access to robots, graph-
ics/multimedia, or other interactive or stimulating devices and
environments. It is important to consider whether this is enough
for today’s introductory programming students. Better integration
of realistic data sets, virtual reality, and other hot-topic areas into
introductory programming could help to engage more students,
including those who are harder to engage.
6.4.4 Tools for Learning Programming.
One of the earliest focuses of tool developers was to build tools
that help students to practise writing code. Over the years, these
tools have evolved from grading to also providing feedback, and
from presenting a xed sequence of drill-and-practice problems to
presenting problems adapted to the needs of the learner. These tools
are only as eective as the feedback they provide, but provision of
useful feedback is complicated by the fact that the coding solution
to a problem is never unique.
Some programming tools have been designed to address specic
troublesome programming concepts such as stack and heap visuali-
sation [
], recursion [
], arrays [
], complex data types [
exception handling [
], loops [
], linked lists [
], and object-
oriented programming [172].
A number of dierent types of tool have been designed to help
students learn programming in general:
Assessment systems used to facilitate learning, e.g., Python
Classroom Response System [732].
Compiler assistants, which accept student solutions, present
compiler error messages with additional information to help
the student x any syntax errors in the program, and collect
usage data for analysis by the tool administrator.
Program submission systems, which accept student solu-
tions, may include a compiler assistant, use test cases to
determine whether the solution is correct, and provide the
student with minimal feedback to that eect, e.g., Online
judge system (YOJ) [645].
Drill-and-practice systems, which are program submission
systems containing a host of built-in problems and a les-
son plan, that is, an order in which those problems should
be presented to the student, such as CodeWorkout [
CodeWrite [
], CodeLab [
], Projekt Tomo [
], The Pro-
grammer’s Learning Machine (PLM) [
], EduJudge [
AlgoWeb [
], and a tool from Viope [
]. Some also in-
clude animation [561].
Intelligent tutoring systems, which are drill-and-practice
systems that provide context-sensitive feedback and adapt
to the needs of the learner, such as L2Code [
], PHP ITS [
], FIT Java Tutor [
], CPP-
Tutor [
], PROPL [
], and Java Programming Laboratory
(JPL) [527].
Approaches used in tools to teach programming include the
use of mental models [
], notional machines [
], custom lan-
guages [
], and deterministic nite automata [
]. Some pro-
vide automatically-generated hints [
] or narrative-based coach-
ing [
], while others recommend a series of edits to transform a
student’s program into a target version [730].
Not all of these tools were designed to assist with code-writing.
Other ways that tools have supported learning of programming
include code-tracing [
], code style [
], Parsons puz-
zles [
], program analysis, either by generating system depen-
dence graphs [
] or by having students identify beacons in
code [
], and authoring, either by capturing illustrative ex-
amples [
] or by generating sequences of program examples [
Some tools were developed to facilitate communication or collab-
oration, such as between the instructor and student [
] or between
two or more students [86,232,243].
The current trends in tools for programming include address-
ing skills other than code-writing, and using intelligent tutoring
technology and data-driven techniques to improve the feedback
provided to the learner. In an ideal future, the developed tools will
be evaluated for ecacy and disseminated for widespread use by
other instructors.
6.4.5 Debugging, Errors, and Error Messages.
In this section we review tools to aid in debugging programs,
tools that help with student errors, and tools that help students to
interpret error messages. This is an emerging area, particularly the
work on errors and error messages, with most of the papers we
encountered being published since 2010.
Several tools have been developed to help novices debug their
programs. Whyline is a tool that aids debugging by presenting
the programmer with a set of questions based on the code [
The programmer selects questions and the system provides pos-
sible explanations based on several types of code analysis. One
evaluation found that novices using Whyline were twice as fast as
experts without it. Lam et al
. [358]
propose a hybrid system that
provides a certain level of automated debugging combined with
system-facilitated input from instructors. Noting that traditional
debugging tools can be dicult for novices to learn, Hu and Chao
designed a system that provides students with debugging
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
scaoldings designed to help them detect, nd, and correct logic
errors in programs by visualising execution results.
Many tools focus not on debugging but on student errors or on
the error messages generated when errors arise. Most of these focus
on compile-time (syntax and semantic) errors and error messages,
but some work has also been done on runtime messages. A tool
by Jackson et al
. [298]
logs all errors from students and faculty
in order to investigate the discrepancies between the errors that
instructors think students make and those that students actually
make. HelpMeOut [
] is a social recommender system that helps
with the debugging of error messages by suggesting solutions that
peers have applied in the past. Bluex [
] uses crowd-sourced
feedback to help students with error diagnosis. Other tools focus on
enhancing standard compiler error messages, with some studies re-
porting positive eects [
] and others reporting no eect [
leaving the eects of compiler error enhancement a currently open
Although there is much attention on compile-time syntax errors,
many authors have developed tools to help students with warnings,
runtime errors, and other types of error. Backstop is a tool that
helps students interpret Java runtime error messages [
]. Dy and
developed a system that detects ‘non-literal’ Java
errors, errors reported by the compiler that do not match the actual
error. C-Helper is a static checker for C that provides more direct
warning messages than gcc, and also alerts the novice programmer
to ‘latent errors’, those for which commercial compilers do not
report an error, but where the intentions of the programmer are
contrary to the program [671].
The complexity of most debugging tools presents a barrier to
their use by novices. Recently, substantial focus has been given to
tools that reduce the need for debugging, either by attempting to
reduce the number of errors committed by novices in the rst place,
or, when errors do arise, by providing more helpful error messages.
Most of the attention in this area has focused on compile-time
syntax or semantic errors, but runtime errors, warnings, and latent
errors (not reported by the compiler) have also been explored. It is
not yet clear whether these means of helping novices understand
error messages are eective, and more work is needed in the area.
6.4.6 Design Tools.
Introductory programming courses typically set out to teach
both problem solving — devising an algorithm for a given problem
— and programming — converting the algorithm into correct pro-
gram code in a selected programming language. Although problem
solving is an important component of computational thinking, its
teaching is seldom as clearly dened as the teaching of program-
ming. Several tools have been built to help students learn problem
solving, typically using one of three approaches:
UML diagrams: These tools approach problem solving by
means of building UML diagrams, an endeavour uniquely
suited to object-oriented programming; examples are CIMEL
ITS [455] and Green [17,577].
Flowcharts: These tools help the student construct a ow-
chart for the problem. Examples are FITS [
], FMAS [
SITS [268], VisualLogic [231], and Progranimate [581]. The
constructed owchart may then be automatically converted
to code. Some attempts have been made to compare these
tools; for example, RAPTOR and VisualLogic [5].
Pseudocode: These tools elicit the pseudocode in English; an
example is PROPL [216,359].
A tool has also been reported that can be used to capture the
teacher’s problem-solving actions and replay them for the student’s
benet [
], and another uses regular expressions to translate
novice programs into detailed textual algorithms [4].
Tools for problem solving are in their infancy. Given the increas-
ing recognition of the importance of computational thinking in
higher education, it is expected that more developers will focus on
building and evaluating tools in this area.
6.4.7 Visualisation Tools.
This section focuses on visualisation tools intended for use by
novices. We found two broad types, those that visualise programs
and those that visualise algorithms. The program visualisation tools
are dominated by Java, with only a handful of tools focusing on
other languages. Algorithm visualisation is more often language-
In 2013, Sorva et al
. [632]
published the rst comprehensive
review of visualisation systems intended for teaching beginners,
providing descriptions of such systems from the preceding three
decades and reviewing ndings from their empirical evaluations.
This review is intended to serve as a reference for the creators,
evaluators, and users of educational program visualisation systems.
It also revisits the issue of learner engagement, which has been
identied as a potentially key factor in the success of educational
software visualisation, and summarises what little is known about
engagement in the context of program visualisation systems for
beginners. Finally, they propose a renement of the frameworks pre-
viously used by computing education researchers to rank types of
learner engagement. Overall, their review illustrates that program
visualisation systems for beginners are often short-lived research
prototypes that support the user-controlled viewing of program
animations, but they note a recent trend to support more engaging
modes of user interaction. Their ndings largely support the use of
program visualisation in introductory programming education, but
research to 2013 was deemed insucient to support more nuanced
conclusions with respect to learner engagement.
The Sorva et al. review also references a review on algorithm
visualisation by Hundhausen et al
. [288]
that was not identied in
our search because it was published in 2002; however, we mention
it here for the interested reader. As the 2013 review by Sorva et
al. is recent and comprehensive (64 pages and well over 200 refer-
ences), in this section we do not discuss papers that were covered
in that review and also identied in our search [
]. The remaining papers can be classied as
either program visualisation, predominantly in Java, or algorithm
Program visualisation: Java. Jeliot [
] is a program animation sys-
tem in which the visualisation is created completely automatically
from the source code, so that it requires no eort to prepare or mod-
ify an animation, even on the y during a class. JavaCHIME [
is a graphical environment that allows users to examine classes and
objects interactively and to execute individual methods without
Introductory Programming: A Systematic Literature Review ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus
the need for a testing class containing a main method. Users can
also examine variables and methods and interactively alter the val-
ues of variables while testing the methods. PGT (Path Generation
Tool) [
] visualises paths that correspond to statements in source
code. EduVisor (EDUcational VISual Object Runtime) [
], which
can be integrated with the Netbeans IDE, shows students the struc-
ture of their programs at runtime, seeking to incorporate knowledge
accumulated from many prior projects. Using jGRASP [
], stu-
dents are able to build dynamic visualisations that, in conjunction
with the debugger, help them understand and correct bugs in their
programs more eectively than using the debugger alone.
Program visualisation: other languages. Visualisation tools de-
signed for other languages include PyLighter [
] for Python,
CMeRun [197] for C++, and VIPER [3] for Pascal.
Algorithm visualisation. Although algorithm visualisation has a
long history, seven of the eight papers in this subsection are from the
past ve years, suggesting that algorithm visualisation is enjoying a
revitalisation, at least as regards introductory programming. Alvis
Live! is an algorithm development and visualisation model where
the line of algorithm code currently being edited is re-evaluated
on every edit, leading to immediate syntactic feedback, along with
immediate semantic feedback in the form of a visualisation [
Results suggest that the immediacy of the model’s feedback can
help novices to quickly identify and correct programming errors,
and ultimately to develop semantically correct code. Other work
in the area includes using visual tiling patterns for learning basic
programming control and repetition structures [
], a tool for
learning iterative control structures [
], owchart-based algo-
rithm visualisations [
], tools to help with visualising notional
machines [145,620], and a plugin for the MindXpres presentation
tool to improve the visualisation of source code in teachers’ presen-
tations [558,559].
Summary. Visualisation tools for novices fall into two categories,
those that visualise program execution and those that visualise
algorithms. The former are normally tied to a single language, gen-
erally Java, whereas most algorithm visualisation tools are language-
agnostic. It is notable that most of the algorithm visualisation work
we found has occurred in the past ve years. The comprehensive
2013 review by Sorva [632] is commended to readers who wish to
explore the area more deeply.
6.4.8 Collaboration Tools.
The power of learning to program in pairs or groups has long
been recognised in computing education. Groupware technology
is increasingly being used to facilitate learning, among learners
who might be in the same location (collocated) or away from one
another (distributed) and who might be present at the same time
(synchronous) or at dierent times (asynchronous).
Various tools for collaboration have been reported in the liter-
ature and have been reviewed [
]. They fall into a few broad
Tools to help students write programs collaboratively, in-
cluding distributed [
] as well as collocated and synchro-
nous [
]; working on bite-sized programs [
] or objects-
rst programs [
]; and collaboration facilitated by similar-
ities among student programs [72]
Peer assessment [110] and peer review [151]
Incorporating collaboration into existing environments such
as Alice [11], BlueJ [207], and DrJava [625]
Other attempts to incorporate collaboration into introductory
programming include using Facebook to seek immediate assistance
while learning Java [
] and awarding points to students for help-
ing with group learning [512].
With the availability of inexpensive and ubiquitous hardware
(mobile phones, tablets, etc.) the desire for any-time, anywhere
learning, and an increasingly interconnected world with a grow-
ing trend of crowdsourcing and just-in-time pairing of demand
and supply, we expect collaboration tools to take on much greater
prominence in the future of computing education.
6.4.9 Tools Using Games.
Games have probably been used as teaching tools since prehis-
toric times, so it is natural for the practice to be applied to the
introductory programming context. Papers in this category have
an explicit focus on digital games as a teaching tool. We do not
include papers describing non-digital tools such as pen-and-paper
Some papers describe or evaluate approaches, tools, frameworks,
or software development kits (SDKs) that can be used by students
to program their own games or game-like software with only an
introductory level of knowledge and experience. Sung et al
. [646]
present a casual game development framework to help instructors to
incorporate game content into their introductory classes, and Frost
presents a Java package to simplify the development of 2D
games for beginners. Though some papers present example games
to showcase the motivational consequence of such frameworks,
little empirical evaluation is reported. One systematic review found
that incorporating game themes into introductory programming
(for example, as assigned development tasks) did not improve pass
rates to the same extent as other forms of intervention [679].
Educational games that can be used as learning objects or teach-
ing aids have received much attention by researchers during the
period of this review. An ITiCSE working group on game devel-
opment for computing education reviewed over 100 games that
taught computing concepts, 75% of which were classied as cov-
ering ‘software development fundamentals’ [309]1. Other reviews
also identify long lists of digital games that have been developed
to teach programming concepts [
]. Several of these games have
been empirically evaluated. Two notable examples are Wu’s Cas-
tle [
] and Gidget [
]; however, there are too many to list here.
Though studies in this area tend to report promising results, they
seldom rmly establish causality or determining the size of eects
that are uniquely attributable to a particular tool. Studies specic to
the introductory programming context tend not to apply validated
methods of measurement or to compare tools to alternatives. There
is thus little empirical evidence to support the ecacy of game
The games considered by the 2016 ITiCSE Working Group on Game Development
for Computer Science Education are listed at
ITiCSE Companion ’18, July 2–4, 2018, Larnaca, Cyprus Luxton-Reilly, Simon, et al.
development or of playing educational games in the introductory
programming context. Future work should seek to establish such
empirical evidence.
6.4.10 Evaluation of Tools.
Recent years have seen a growing emphasis on the evaluation
of tools and pedagogical approaches in introductory computing.
Approaches used for evaluation including anecdotal, survey-based,
and controlled evaluation. In this section we list papers that deal
with evaluation of tools in any of these forms.
Dierent categories of tools have been evaluated to date:
Visual programming environments: ALICE, one of the more
popular environments, has been evaluated often [
]. More recently, mobile versions of visual environ-
ments have been compared [
]. A study has also inves-
tigated the eectiveness of the block interface common to
block-based programming languages [525].
Mini-languages used to teach introductory programming,
such as Karel the Robot [98] and Logo [116]
The use of robots such as LEGO Mindstorms has been re-
ported to aect motivation [414,435]
Auto-graded coding exercise platforms — student behaviour
when using them and their benets for learning [
Interactive books and learning objects and their benets [
Integrated development environments and their aspects that
might be benecial for novice programmers [304,548]
Intelligent tutoring systems (ITS): their benets [
], the
role of tutorial dialog in ITS [
], and the use of automatic
program repair in ITS [722].
Tools for algorithm development: ALVIS [
] and
], as opposed to tools for coding, such as Veri-
cator [484,485]
Tools have also been used in the evaluation of pedagogical ap-
proaches such as self-paced learning [
], explicit instruction of
the roles of variables [
], and the quality of examples in text-
books [
]. Evaluations have also been made of various features of
tools, including menu-based self-explanation [
], example-based
learning [
], and tutorial dialogue [
], as well as how students