A framework and methodology for studying the causes of software errors in programming systems
ABSTRACT An essential aspect of programmers' work is the correctness of their code. This makes current HCI techniques ill-suited to analyze and design the programming systems that programmers use everyday, since these techniques focus more on problems with learnability and efficiency of use, and less on error-proneness. We propose a framework and methodology that focuses specifically on errors by supporting the description and identification of the causes of software errors in terms of chains of cognitive breakdowns. The framework is based on both old and new studies of programming, as well as general research on the mechanisms of human error. Our experiences using the framework and methodology to study the Alice programming system have directly inspired the design of several new programming tools and interfaces. This includes the Whyline debugging interface, which we have shown to reduce debugging time by a factor of 8 and help programmers get 40
-
Citations (0)
-
Cited In (0)
Page 1
Journal of
Visual Languages
& Computing
Journal of Visual Languages and Computing
16 (2005) 41–84
A framework and methodology for studying the
causes of software errors in programming systems
Andrew J. Ko?, Brad A. Myers
Human-Computer Interaction Institute, School of Computer Science, Carnegie Mellon University, 5000
Forbes Ave., Pittsburgh, PA 15213, USA
Received 1 January 2004; received in revised form 1 July 2004; accepted 1 August 2004
Abstract
An essential aspect of programmers’ work is the correctness of their code. This makes
current HCI techniques ill-suited to analyze and design the programming systems that
programmers use everyday, since these techniques focus more on problems with learnability
and efficiency of use, and less on error-proneness. We propose a framework and methodology
that focuses specifically on errors by supporting the description and identification of the causes
of software errors in terms of chains of cognitive breakdowns. The framework is based on
both old and new studies of programming, as well as general research on the mechanisms of
human error. Our experiences using the framework and methodology to study the Alice
programming system have directly inspired the design of several new programming tools and
interfaces. This includes the Whyline debugging interface, which we have shown to reduce
debugging time by a factor of 8 and help programmers get 40% further through their tasks.
We discuss the framework’s and methodology’s implications for programming system design,
software engineering, and the psychology of programming.
r 2004 Elsevier Ltd. All rights reserved.
ARTICLE IN PRESS
www.elsevier.com/locate/jvlc
1045-926X/$-see front matter r 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jvlc.2004.08.003
?Corresponding author. Tel.: +14122681266; fax: +14124010042.
E-mail address: ajko@cmu.edu (A.J. Ko).
Page 2
1. Introduction
‘‘Human fallibility, like gravity, weather and terrain, is just another foreseeable
hazardy The issue is not why an error occurred, but how it failed to be corrected.
We cannot change the human condition, but we can change the conditions under
which people work.’’
James Reason, Managing the Risks of Organizational Accidents [1]
In 2002, The National Institute of Standards and Technology published a study of
major U.S. software engineering industries, finding that software engineers spend an
average of 70–80% of their time testing and debugging, with the average bug taking
17.4hours to fix. The study estimated that such testing and debugging costs the US
economy over $50 billion annually [2]. One reason for these immense costs is that as
software systems become increasingly large and complex, the difficulty of detecting,
diagnosing, and repairing software problems has also increased. Because this trend
shows no sign of slowing, there is considerable interest in designing programming
systems that can demonstrably prevent errors, and better help programmers find,
diagnose and repair the unprevented errors.
Unfortunately, the design and evaluation of such ‘‘error-robust’’ programming
systems still poses a significant challenge to HCI research. Most techniques that have
been proposed for evaluating computerized systems, such as GOMS [3] and
Cognitive Walkthroughs [4], have focused on low-level details of interaction,
bottlenecks in learnability and performance, and the close inspection of simple tasks.
In programming activity, however, even ‘‘simple’’ tasks are complex, and
productivity bottlenecks are more often in repairing errors than in learning to avoid
them. Even with the more design-oriented HCI techniques, understanding a
programming system’s error-proneness has been something of a descriptive dilemma.
Nielsen’s Heuristic Evaluation suggests little more than to prevent user errors by
finding common error situations [5]. The Cognitive Dimensions of Notations
framework [6], though applied to numerous programming systems [7,8], char-
acterizes error-proneness simply as ‘‘the degree to which a notation invites mistakes.’’
In this paper, we offer an alternative technique, specifically designed to objectively
analyze a programming system’s influence on errors. We integrate several of our
recent studies with three strands of prior research:
? Past classifications of common programming difficulties;
? Studies of the cognitive difficulties of programming; and
? Research on the general cognitive mechanisms of human error.
From this research, we derive a framework for describing chains of cognitive
breakdowns that lead to error, and a methodology for sampling these chains
by observing programmers’ interaction with a programming system. We hope
that these contributions will not only be valuable tools for improving existing
programming languages and environments, both visual and textual, but also for
guiding the design of new error-robust languages, environments, and interactive
visualizations.
ARTICLE IN PRESS
A.J. Ko, B.A. Myers / Journal of Visual Languages and Computing 16 (2005) 41–84
42
Page 3
This paper is divided into six parts. In the next section, we review classifications of
common programming difficulties, studies of programming that suggest several
causes of error, and research on the general mechanisms of human error. In Section
3, we describe our framework in detail and in Section 4 we describe an empirical
methodology for using the framework to study a programming system’s error-
proneness. In Section 5, we describe our experiences using the framework and
methodology to analyze the Alice programming system [10]. We end in Section 6
with a discussion of the strengths and applicability of our framework and
methodology to programming system design, software engineering, and the
psychology of programming.
2. Definitions, classifications and causes
In this section, we review three strands of research: classifications of common
programming difficulties, studies of cognitive difficulties in programming, and
research on the general mechanisms of human error. To help frame our discussion,
let us first define some relevant terminology.
2.1. Terminology
If the goal of software engineering is to build a product that meets a particular
need, the correctness of a software system can be defined relative to interpretations of
this need:
? General expectations of the software’s behavior and functionality;
? A software designer’s interpretation of these expectations, known as requirement
specifications;
? A software architect’s formal and informal interpretations of the requirement
specifications, known as design specifications;
? A programmer’s understanding, or mental model, of design specifications.
Becausewe are interested in how
improve correctness, we define correctness relative to design specifications.
While there can certainly be problems with design specifications, as well as
requirements, such problems are typically outside the influence of programming
systems.
Given this definition of correctness, we define three terms: runtime failures, runtime
faults, and software errors (illustrated in Fig. 1). A runtime failure is an event that
occurs when a program’s behavior—often some form of visual or numerical program
output—does not comply with the program’s design specifications. A runtime fault is
a machine state that may cause a runtime failure (e.g., a wrong value in a CPU
register, branching to an invalid memory address, or a hardware interrupt that
should not have been activated). A software error is a fragment of code that may
cause a runtime fault during program execution. For example, software errors in
a programming system canhelp
ARTICLE IN PRESS
A.J. Ko, B.A. Myers / Journal of Visual Languages and Computing 16 (2005) 41–84
43
Page 4
loops include a missing increment statement, a leftover ‘‘break’’ command from a
debugging session, or a conditional expression that always evaluates to true. It is
important to note that while a runtime failure guarantees that one or more runtime
faults have occurred, and a runtime fault guarantees that one or more software
errors exist, software errors do not always cause runtime faults, and runtime faults
do not always cause runtime failures. Also note that under our definition, a single
change to the design specifications can introduce an arbitrary number of software
errors.
Using these definitions, a number of other terms can be clarified. A bug is
an amalgam of one or more software errors, runtime faults, and runtime failures.
For example, a programmer can refer to a software error as a bug, as in
‘‘Oh, there’s the bug on line 43,’’ as a runtime failure, as in ‘‘Oh, don’t worry about
that. It’s just a bugy’’ or even as all three, as in ‘‘I fixed four bugs today.’’
Debugging involves determining what runtime faults led to a runtime failure,
determining what software errors were responsible for those runtime faults, and
modifying the code to prevent the runtime faults from occurring. Testing involves
searching for runtime failures and recording information about runtime faults to aid
in debugging.
A programming system is a set of components (e.g., editors, debuggers,
compilers, and documentation), each with (1) a user interface; (2) some set of
information, such as program code or runtime data, which the programmer
views and manipulates via the user interface; and (3) a notation in which the
information isrepresented. We illustrate
system components in the rows of Fig. 2. The figure is read from left to right; for
example, a programmer uses diagrams and printed text to view and manipulate
specifications, which are represented in natural language, UML, or some other
notation; a programmer uses an editor to view and manipulate code, which is
represented in terms of some programming language; a programmer uses a
debugger to view and manipulate a machine’s behavior, which is in terms of stack
traces, registers, and memory; a programmer uses an output device to view a
program’s behavior, which is often represented as graphics, text, animation, and
sound, etc.
some common programming
ARTICLE IN PRESS
Fig. 1. The relationship between software errors in code, runtime faults during execution, and runtime
failures in program behavior. These images will be used to represent these three concepts throughout this
paper.
A.J. Ko, B.A. Myers / Journal of Visual Languages and Computing 16 (2005) 41–84
44
Page 5
2.2. Classifications of common programming difficulties
Prior work on classifying common programming difficulties—summarized
chronologically in Table 1—has been reasonably successful in motivating novel
and effective tools for finding, understanding and repairing software errors. For
example, in the early 1980s, the Lisp Tutor drew heavily from analyses of novices’
software errors [11], and nearly approached the effectiveness of a human tutor. More
recently, the testing and debugging features of the Forms/3 visual spreadsheet
language [12] were largely motivated by studies of the type and prevalence of
spreadsheet errors [13].
ARTICLE IN PRESS
Fig. 2. A programming system has several components (in rows), each with an interface, information, and
notation (in columns). The diagram is read from left to right, as in ‘‘Programmers use interface X to view
and manipulate information Y, which is represented in notation Z.’’
A.J. Ko, B.A. Myers / Journal of Visual Languages and Computing 16 (2005) 41–84
45