Assessing, Comparing, and Combining State Machine-Based Testing and Structural Testing: A Series of Experiments
ABSTRACT A large number of research works have addressed the importance of models in software engineering. However, the adoption of model-based techniques in software organizations is limited since these models are perceived to be expensive and not necessarily cost-effective. Focusing on model-based testing, this paper reports on a series of controlled experiments. It investigates the impact of state machine testing on fault detection in class clusters and its cost when compared with structural testing. Based on previous work showing this is a good compromise in terms of cost and effectiveness, this paper focuses on a specific state-based technique: the round-trip paths coverage criterion. Round-trip paths testing is compared to structural testing, and it is investigated whether they are complementary. Results show that even when a state machine models the behavior of the cluster under test as accurately as possible, no significant difference between the fault detection effectiveness of the two test strategies is observed, while the two test strategies are significantly more effective when combined by augmenting state machine testing with structural testing. A qualitative analysis also investigates the reasons why test techniques do not detect certain faults and how the cost of state machine testing can be brought down.
-
Citations (0)
-
Cited In (0)
Page 1
Carleton University, TR SCE-08-09
1
Assessing, Comparing, and Combining State machine-Based
Testing and Structural Testing: A Series of Experiments
Samar Mouchawrab1, Lionel C. Briand2, Yvan Labiche1, and Massimiliano Di Penta3
1 Software Quality Engineering
Laboratory
Department of Systems and Computer
Engineering
Carleton University
1125 Colonel By Drive, Ottawa, ON
K1S5B6, Canada
{samar, labiche}@sce.carleton.ca
2 Simula Research Laboratory
Department of Software Engineering
Martin Linges v 17, Fornebu
P.O.Box 134,
1325 Lysaker, Norway
briand@simula.no
3 RCOST - Research Centre on Software
Technology
Department of Engineering
University of Sannio, Piazza Roma,
I-82100 Benevento, Italy
dipenta@unisannio.it
Abstract
A large number of research works have addressed the importance of models in software
engineering, mainly in the design and testing of robust software systems, for example in
automating the test of software systems. Although models have been proven to be helpful in a
number of software engineering activities, such as providing a better medium for communication
among designers and customers, there is still significant resistance to model-driven development
in many software organizations. The main reason is that it is perceived to be expensive and not
necessarily cost-effective. This study investigates one specific aspect of this larger problem. It
addresses the impact of using state machines for testing class clusters that exhibit a state-
dependent behavior on testing cost effectiveness when compared with structural testing, which is
perhaps the most common systematic testing practice. More precisely, it reports on a series of
controlled experiments that investigate the impact of state machine-based testing on fault-
detection and cost-effectiveness. Based on previous work showing this is a good compromise in
terms of cost and effectiveness, we focus on a specific state-based technique that is an adaptation
of the W-Method for state machines: the round-trip paths coverage criterion. Code-based
structural testing is compared to round-trip paths testing and their combination is investigated to
determine whether they are complementary. A series of four controlled experiments are
presented and discussed in this report. Differences between the results of the different
experiments are highlighted and plausible explanations for these differences are provided.
Results show that, even when a state machine models accurately the behavior of the cluster under
test, no significant difference between the fault detection effectiveness of the two test strategies is
Page 2
Carleton University, TR SCE-08-09
2
observed. And in all cases, even when the state machine’s representation of the cluster’s behavior
is limited, the two test strategies are significantly more effective when combined. This implies
that a cost-effective strategy could be to specify state machine-based test cases early on, execute
them when the source code becomes available, and then complete them with structural test cases
based on coverage analysis. A qualitative analysis is also presented, where we investigate the
reasons for undetected faults and derive ways to improve the state machine-based testing of
source code. As an outcome of this analysis and based on the presented results, we recommend a
number of updates to the round-trip paths testing strategy in order to improve its fault-detection
effectiveness and lower its cost.
1
INTRODUCTION
There is an increasing interest [52] in model-driven development for object-oriented
systems, using for example the Unified Modeling Language (UML). In addition to be a key
resource for designing object-oriented software and providing means for communicating ideas
among designers and customers, models are very useful in testing object-oriented software. A
number of model-based testing methodologies have been proposed based for example on use
cases, class diagrams, and state machines [14, 16, 19, 27, 63, 69, 71, 73, 74, 89] .
Model-based testing has been assessed in a number of empirical studies and showed to be
useful in systematically defining test strategies and criteria, and deriving test cases and oracles
[18, 21, 22, 28, 71, 75]. A number of researchers conducted studies on the cost and effectiveness
of conventional testing strategies: white-box [39-41, 50, 90] and black-box testing strategies [18,
77, 91].
Despite a growing number of studies [9, 16, 18, 19, 21, 22, 27, 28, 69-71], more empirical
evidence is required to assess the importance of models in improving testing cost-effectiveness.
As a result, there is little incentive for testers to adopt model-driven testing practices and it is
difficult to determine how they should be integrated, if at all, with traditional testing practices.
This report focuses on the effectiveness of UML state machine-based testing when compared and
combined to white-box, structural testing, which is deemed to be the most common basic
technique for testing (clusters of) components. At the same time, the most complex and critical
components in object-oriented software are also the ones which, according to mainstream UML
development methods, would likely be modeled with state machines [44]. Therefore, assessing
Page 3
Carleton University, TR SCE-08-09
3
the cost and effectiveness of testing techniques based on state machines and comparing them
with simpler, code coverage-based techniques seems a logical investigation to undertake. The
latter being a basic test practice automated by existing commercial tools, only a significant
improvement in fault detection effectiveness or cost would justify the use of approaches based on
state machines. The choice of UML as a language to model state machines is a practical one, as
UML is becoming a de facto standard. Specifically this report focuses on a specific state
machine-based test strategy, i.e., round-trip paths testing [14], an adaptation of the W-method
[29] for state machines. The choice is driven by our previous work on the subject [18], showing
that this is a reasonable compromise in terms of fault-detection effectiveness and cost between
cheap but inefficient criteria (e.g., all transitions) and efficient but expensive criteria (e.g., all
transition pairs). In this report we perform both a quantitative analysis of differences in fault-
detection effectiveness and cost among test techniques, and a qualitative analysis to understand
the reasons for these differences and the variations observed across test drivers and component
clusters, with the aim of answering the following research questions:
−
How does the fault-detection effectiveness of test cases identified and generated based on
the state machine alone compare to that of test cases generated based on the coverage of the
code (structural testing)?
−
Are the sets of faults detected by state machine-based testing and structural testing
techniques complementary? Does this suggest a combination of the two techniques is
beneficial in terms of fault detection effectiveness?
−
How does code coverage in terms of node and edge coverage in the cluster under test
achieved by state machine testing compare to that of structural testing?
−
How does the trade-off between cost and effectiveness (cost-effectiveness) compare for
state machine-based testing and structural testing?
−
What are the different factors that impact the effectiveness of state machine-based testing
techniques?
−
More specifically, how can the state machine-based, round-trip testing strategy be modified
to improve its fault-detection effectiveness and lower its cost?
Because no analytical means can help us obtain realistic, practically useful answers to these
questions, we conducted a series of four controlled experiments involving undergraduate and
Page 4
Carleton University, TR SCE-08-09
4
graduate students on three object-oriented class clusters with a non-trivial state-dependent
behavior modeled using UML state machines. Results show that:
−
Overall, techniques based on code coverage and state machines do not show practically
significant differences in terms of fault detection.
−
The two techniques are, to a significant extent, complementary in terms of the faults they
detect.
−
The real-time behavior of a cluster increases the complexity of both the code and the state
machine, and often results in practice in the latter being incomplete and only a partial
model of the implementation. This complexity in turn decreases the fault detection
effectiveness of both test techniques, as it is harder to cover the source code and state
machine within limited time. However, complex state machines with guard conditions and
self transitions can help increase fault detection effectiveness by enabling, for example, the
test of boundary cases. High code coverage, as well as high path coverage in the state
machine, do not systematically entail high fault detection effectiveness: other test
techniques are necessary to uncover faults closely related to real-time or concurrent
behavior and complex control flow.
−
A proposition for a number of improvements to be applied to the round-trip paths testing
strategy to improve both its fault detection effectiveness and its cost-effectiveness.
−
Guidelines for testers on when to use state machine-based testing and how to integrate it
with other test strategies.
The report is organized as follows: Section 2 discusses the related literature and Section 3
provides a detailed description of the conducted controlled experiments while Section 4 presents
and analyses the results. Section 5 details the outcome of a qualitative analysis to understand the
limitations of the state machine testing technique, and the factors affecting the cost of developing
test drivers. Threats to validity are discussed in Section 6. Based on the results of the experiments
and the observations from their executions, a number of changes to the state machine testing
strategy are proposed in Section 7 to improve its fault detection and cost-effectiveness. Section 8
summarizes the lessons learned from conducting the experiments and lists guidelines for the
design and the execution of experiments in software testing. Guidelines on where and how to use
Page 5
Carleton University, TR SCE-08-09
5
state machine-based testing are presented in Section 9. Finally, Section 10 concludes and
summarizes the results.
2
RELATED WORK
As discussed above, we address the fault-detection and cost effectiveness of a state-based
testing technique, the Round-trip path strategy [14], by empirically comparing it with and
combining it to traditional structural testing. We will therefore restrict the discussion of related
work to state-based and structural testing (Section 2.1) with a focus on empirical studies (Section
2.2). Though this is only a subset of the work in the more general area of model-based testing,
such a restriction in scope is necessary due to space constraints.
2.1 State-Based and Structural Testing
One of the earliest works on state-based testing is the work by Chow [29] who proposed
the W-method for finite state machines (FSM). This method has been adapted to UML state
machines by Binder [14] and renamed the round-trip paths (RTP) strategy. In both techniques,
the state model is traversed to construct a transition tree that includes all transitions in the state
machine in such a way that the traversing along a path stops whenever the state encountered is
already present in the tree. When there are guard conditions on transitions and these guards are in
a disjunctive form then several transitions are warranted, one for each truth value combination
that is sufficient to make the guard condition true.
Other state-based test crieria were proposed by Offutt et al. [75]: transition coverage, full
predicate coverage, transition-pair coverage, and complete sequence. A case study was used to
compare these criteria with a random selection of test cases. Results showed an improvement in
fault detection when using the full-predicate coverage criterion, while transition coverage yielded
a smaller number of test cases than random testing, with the same fault detection rate and branch
coverage (of the source code control flow graph).
Additional testing strategies have been defined for state machines. Hong et al. [48] propose
a technique to derive extended finite state machines (EFSM) from state machines. The EFSM is
then transformed into a flow graph modeling the control flow and data flow in the state machine
thus enabling the application of conventional control and data flow analysis techniques. A
modification of this method is described by Bogdanov and Holcombe [16] to address the
Page 6
Carleton University, TR SCE-08-09
6
compliance of an implementation of a system to its specification. This has been further extended
to UML state machines with operations contracts defined in OCL [20].
FSMs, EFSMs, and UML state machines have been widely used to model systems in
various application domains, such as sequential circuits and communication protocols. The study
in [54] provides a rather complete and clear review of the fundamental problems in finite state
machines testing.
One problem of special interest to us is conformance testing [54], which is to test whether
the implementation conforms to the specification. All the proposed methods for conformance
state-based testing have the same basic objective: To ensure that every transition of the
specification state machine is correctly implemented. The methods however differ in terms of the
types of sequences of inputs they use to verify that the machine is in the right final state. For
instance, one strategy consists in constructing a breadth-first-traversal transition tree and
checking states in that order. For every state, a checking sequence is applied to verify if the
system under test is in the expected destination state [54]. Similarities can be seen between this
strategy and the RTP technique. Both strategies are based on the generation of a tree from the
state machine in which the paths from the root state along the branches of the tree correspond to
test cases aimed at comparing a state machine to its implementation. One difference between the
two strategies is the method for identifying visited states: The RTP technique uses state
invariants to differentiate states and the other strategy uses separating sequences1 for this
purpose.
One important application of conformance testing is protocol testing where FSMs define
protocol specifications [15]. Many systematic methods were defined for testing protocol
implementations against their specifications. Some of these methods are guided by heuristics [11,
68]; others are based on fault models [57, 79, 94]. These methods determine distinguishable
states traversed by a test sequence with the aim of verifying the conformance relationship
between the protocol specification and its implementation. As opposed to other finite state
machine testing strategies, protocol conformance testing strategies mainly address non-
deterministic finite state machines [15].
1 Let λ(si, x) be the output function which is the result of applying the sequence of inputs x to the state si. A sequence
x is said to be a separating sequence for state si when λ(si, x) ≠ λ(sj, x) for all state sj≠si.
Page 7
Carleton University, TR SCE-08-09
7
2.2 Empirical Studies
From a more general standpoint, a growing number of empirical studies address the cost
effectiveness of testing strategies in various types of testing techniques: white-box [39, 41, 90],
black-box [91], or model-based [7, 18, 22, 70, 81]. Many of these studies use the mutation
strategy to seed faults and evaluate the fault detection effectiveness of the testing techniques. For
instance, a simulation and analysis procedure [22] has been proposed and used to study the cost-
effectiveness of four state machine-based coverage criteria, namely all-transitions, all-transition-
pairs, full-predicate [75], and round-trip paths [14]. Briand et al. [22] showed that the cost
effectiveness of these criteria depends to a significant extent on the characteristics of the state
machine. For state machines labeled with numerous guard conditions, the round-trip paths
strategy provides a good compromise between all-transitions and all-transition-pairs, the latter
being far too expensive and the former rather ineffective.
An empirical study focusing on white-box testing strategies was performed by Frankl and
Weiss [39] where the all-uses and decision (all-edges) criteria were compared to each other and
to the null criterion (random test suites). This study was performed on nine small programs
whose size ranges from 33 to 60 LOCs for which the authors had access to real faults. Results
showed that all-uses was not always more effective than the decision and the null criteria but in
few cases where there was a difference, the difference was, however, large. In contrast, in cases
where the decision criterion was more effective than the null criterion, the difference was small.
Results varied significantly according to the programs analyzed. A question arises: How do the
domain of the program under test and the type of faults targeted affect the results? To answer this
question, further investigation is required. The results of Frankl and Weiss were further
confirmed by Hutchins et al. [50], whose experiments showed a better fault detection
effectiveness of the all-uses criteria over the all-edges criteria, to the expense of larger test sets.
The two techniques seem to be complementary in terms of the faults detected. This is yet another
example of the benefit of combining different testing techniques on the overall fault detection
effectiveness of test sets.
Pretschner et al. focused on the automatic generation of test cases on the grounds of
symbolic execution with Constraint Logic Programming (CLP) [81]. The aim of the symbolic
execution of the model is to find an execution trace—and therefore a test case—that leads to the
state to be tested. A number of strategies are used to optimize the traversal of the state machine.
Page 8
Carleton University, TR SCE-08-09
8
For instance, a fitness function is defined to generate the shortest path to the destination state.
Other strategies include attributing probalilities to transitions or storing visited states and
transitions to prevent repeated visits. The study aims at comparing the use of behavioral models
namely extended finite state machines (EFSM) to hand-crafted tests generated based on
requirements’ message sequence charts (MSC results show that the use of models significantly
increases the number of detected requirement errors. However, the number of detected
programming errors was unrelated to the use of models [81]. Our work differs in that it compares
a more widely applied state machine testing technique (RTP) to a practically common and widely
used structural testing technique (nodes and edge coverage). It does so by performing replicated,
controlled experiments involving human participants that aim at precisely understanding the
limits of each technique and how they complement each other.
An approach in model-based testing was proposed and validated by Nebut et al. in [70].
The approach consists in automating the generation of system test scenarios from use cases in the
context of object-oriented embedded software. By using contracts with UML use cases, the
authors apply Meyer’s Design By Contract approach [64] at the requirement level. Executable
contracts written in terms of logical expressions allow for defining valid sequences of use cases
and extracting relevant paths which are called test objectives. Subsequently, test scenarios are
generated from test objectives. An empirical evaluation of the proposed approach was executed
on three small case studies (800 LOC to 2000 LOC) to assess the efficiency of the generated test
cases in terms of statement coverage. A classification of the code under test aimed at identifying
functional code and differentiating it from other code categories (dead code and robusteness
code). The results showed that most of the functional statements in the code are covered by the
proposed technique with relatively small set of test cases.
Briand et al. [18] focused on the cost effectiveness of the RTP technique [14]. They
investigate, in controlled experiment settings, the fault detection effectiveness of state-based
(RTP) testing for classes or class clusters modeled with state machines. They also investigate the
usefulness of complementing RTP with a well-known black-box testing technique: category-
partition (CP) [77]. The study was based on a series of controlled experiments where the RTP
technique was applied on a number of systems with two different levels of oracle precision.
Drivers implementing the RTP technique were then complemented with an implementation of
the CP technique [77]. Though useful at detecting faults, results showed that the RTP technique
Page 9
Carleton University, TR SCE-08-09
9
needed to be complemented with CP to significantly increase its fault detection. These results
were one of the motivations for our study. The limitations of the RTP technique incited us to
further identify the factors that affect its fault-detection effectiveness. However, in contrast with
Briand et al. [18], we choose to compare and complement the RTP technique with a structural
testing technique rather than with a black-box testing technique. The cost of applying the
category-partition technique is high and the human resources available for applying it may be
limited. However, a code coverage, structural technique can be easily applied. By using code
coverage instrumentation, structural testing can be helpful at identifying those parts of the cluster
under test that were not tested by the state-based testing technique. This is of pratical importance
as state models are rarely complete and fully defined in practice. Briand et al.[18] also noticed
the significant difference in terms of fault detection and cost between two oracle strategies, one
using precise oracles checking the concrete state of objects (i.e., checking all attribute values),
and the other is based on state invariants. We also address this issue and suggest ways to limit the
cost of oracles without affecting their fault-detection effectiveness.
A growing number of studies [10, 24, 56, 65, 82, 83] have been discussing the importance
of replicating empirical studies to increase the credibility and the generality of empirical results.
A study may be replicated externally or internally2, with or without experiment design changes.
In this research study, we replicated three times an original experiment across two geographical
locations, with slight changes in the experiment plan to address some uncovered threats to
validity and to overcome some of the limitations encountered in the original experiment. More
details on the changes and their rationale are provided in Section 3.
Basili et al. [10] discussed the importance of having families of experiments on building
credible knowledge in software engineering through empirical studies. These experiments would
have in common the same research questions or sometimes extend the studied theory by varying
research questions to investigate other aspects related to the study. Families of experiments are of
great value for building knowledge around existing and new software technologies and
processes. As an example, Basili et al. [10] reflect on a set of software reading techniques’
experiments and propose an organizational framework for experiments. The objective of the
2 Replication takes two forms: internal and external. Internal replication is undertaken by the original
experimenters while external replication is undertaken by independent researchers and is critical for establishing
sound results [31].
Page 10
Carleton University, TR SCE-08-09
10
framework is to provide support for experimenters on how to better define experiments and
combine them to overcome validity problems. It addresses the modeling of processes and their
effectiveness, as well as context modeling and consequences of experimental designs on threats
to validity. The proposed framework supports families of experiments and facilitates their
abstraction by building knowledge on top of them. This framework is based on the GQM (Goal /
Question / Metric) template. In their study, the authors identified different types of replications
varying from strict replications to those varying research questions or even extending the studied
theory. They put strong emphasis on lab package design. The cost of an experiment increases
greatly because of the preparation of the different required artifacts. Thus reusing artifacts can
reduce experimentation cost. Lab packaging then helps reduce such cost and facilitates the
experimenter’s work in experiments’ preparation.
3
EXPERIMENT DESCRIPTION
This section describes the experiments we performed following a specific template [93].
First, we define the objective of the original experiment and its context (Section 3.1), and then
we describe the plan of the experiment including the context selection criteria, the research
questions, and the experiment design (Section 3.2). In Section 3.3 we describe how the
experiments were prepared and executed. Section 3.4 describes the differences between the first
experiment and its subsequent three replications, and the rationale for the changes that were
made to the experiment design and opration.
3.1 Experiment definition and context
The goal of this study is to analyze the state machine based round-trip path testing strategy
and the edge coverage structural testing technique for the purpose of comparing and assessing
them as well as their combination with respect to their fault detection effectiveness and cost
effectiveness from the point of view of the tester. The context consists of objects, i.e., source code
and UML documentation (specifically state machines) of three Java software clusters, and
participants, i.e., undergraduate students from the 4th year of software engineering at Carleton
University, Canada, and graduate students from the Master in Software Technology of the
University of Sannio, Italy.
Page 11
Carleton University, TR SCE-08-09
11
The study is conducted as a series of four experiments: two conducted at Carleton
University, Canada, and two at the University of Sannio, Italy; since replication is imperative to
increase the credibility of results and to allow more robust conclusions to be drawn.
This comparison of state-based testing and structural testing techniques we conducted is of
practical importance for a number of reasons. First, various forms of state-based testing have
been discussed for many years in the research literature, whether for software components or
protocols (Section 2). Despite this, we know very little about the benefits of such practice for
software testing. The focus on UML state machines is motivated by practical reasons, as we want
to place our work in the context of UML-based development. The comparison with structural
testing is due to its wide application in practice, at least in its simplest forms, and it can therefore
be considered a reasonable baseline of comparison. Such a comparison, however, only makes
sense when testing software components (e.g., classes, clusters) that have a state-driven behavior.
Though such components are far to represent the largest proportion of classes in a typical cluster,
the most complex components are usually state-driven.
When referring to state machine models, we do not include only the state machines
themselves but also the related artifacts that are required to understand them: class diagrams,
class public interfaces (signatures, attributes), contracts and state invariants, and a textual, high-
level description of the software functionalities and objectives. However, as participants working
with the UML artifacts are expected to use the state machine diagram to generate test cases, for
the sake of brevity, we will simply refer to them as a “state machine model” in the remainder of
the paper.
The experiments involved three Java class clusters, that all have a state-driven behavior:
a.
OrdSet is a Java class (of 393 lines of code – LOC) that was included in the original
experiment and its first replication. Each instance of OrdSet represents a bounded, ordered
set of integers. The OrdSet class provides methods for adding a single element, removing
a single element, and creating the union of two ordered sets.
b.
Cruise Control is a cluster of four Java classes (of 358 LOC). It simulates a car engine
and its cruising controller.
Page 12
Carleton University, TR SCE-08-09
12
c.
Elevator is a cluster of eight Java classes (of 581 LOC) that was included in the last two
replications only, as a way to make our results less dependent on the first two clusters. It
consists of a number of elevators servicing a number of floors. An elevator accepts stop
requests to travel to other floors. All elevators start at first floor. Users can also request
service from floors to go up or down.
The above three class clusters were extracted from a pool of software engineering students’
final year projects, where teams of students follow a rigorous, UML-based, development
strategy. Since these students were carefully trained for four years and specialized in software
engineering, we expected their models and implementations to be representative of the best we
could expect in practice. These implementations and corresponding state machines were
simplified from their original versions implemented by the engineering students in order to give
testers sufficient time for testing them within the duration of laboratory sessions. These class
clusters were selected in part because of their differences. They represent two typical cases where
a state machine is used to model the behavior of a complex data structure (OrdSet) and a state-
dependent control class in a real-time multithreaded control cluster (Cruise Control and
Elevator).
The three class clusters are of varying code and state machine complexities, which we
summarize in Table 1. For each cluster, the table reports the number of classes in the cluster, the
total number of operations and attributes, the cluster size in lines of code (LOC) corresponding to
the sum of classes’ LOC measures, the number of control-flow statements, i.e. if/else, for and
while statements, the total number of nodes and edges in the cluster’s control flow graph, and the
number of transitions, states and events in the state machine. The numbers in Table 1 correspond
to the simplified versions of the clusters used in the series of experiments.
Although OrdSet is composed only of one class, one can note that its state machine and
control flow are more complex than those of Cruise Control; this is visible from the number
of control flow edges in the OrdSet source code and the number of transitions in its state
diagram. Furthermore, the guard conditions in the OrdSet state machine adds to the complexity
of the class, whereas Cruise Control is event-driven only. Elevator is far more complex
than the two other clusters. Both its code size attributes such as LOC and number of nodes and its
Page 13
Carleton University, TR SCE-08-09
13
state machine complexity attributes such as number of transitions and number of events are
greater than those of the two other clusters.
# classes
# operations
# attributes
# LOC
# control flow statements
# nodes
# edges
# transitions
# states
# events
Cruise Control
4
34
14
358
33
106
103
17
5
7
OrdSet
1
23
5
393
36
111
126
22
5
5
Elevator
8
74
37
581
56
241
214
50
6
10
Table 1: Size of source code and state machines
The source code used in these experiments is admitedly small. However, it is important to
note that state machines, in UML-based development, are mostly used to model the behavior of
complex classes or class clusters [25, 44, 53], particularly complex data structures (usually
referred to as entity classes) and control classes, for example in reactive systems. Furthermore,
even when the source code is small, the state-behavior can be quite complex when measured in
terms of states, events, and transitions. This issue is further discussed in Section 5.
3.2 Experiment planning
There are many ways we could have designed, planned, and executed experiments to
achieve our objectives. Before delving into details in the next subsections, let us provide a high-
level view of the experimental approach we adopted and its rationale. In an experimental,
artificial setting, the time allocated to an experiment is necessarily limited. What this implies is
that test techniques will be compared assuming a limited, equal effort on the part of testers. From
a comparison standpoint, this is fair but in terms of absolute fault detection effectiveness, the
results could, in practice, look very different depending on the effort involved. Note that in
practice the time dedicated to testing is also constrained and we therefore do not consider this as
a limitation given our objectives.
Another important point to highlight, which is further discussed in [17], is that techniques
that involve humans can, of course, be incorrectly or incompletely applied, especially when
under time constraints. The question is then whether we want to assess a technique in terms of its
Page 14
Carleton University, TR SCE-08-09
14
maximum potential, when perfectly implemented, or whether we want to account for human
factors. In our context, we chose the latter, thus justifying the use of experimentation with human
participants. Furthermore, to be realistic in terms of human factors, we need to ensure we provide
testers with adequate support that is representative of what could be expected in the current state
of technology. For instance, testers would be provided trained for the tasks in laboratories
previous to the experiment. They would also be provided with driver templates.
This section details the experimental plan, describing the context, the research questions,
the variables, and the design.
3.2.1 Context selection
The participants involved in the four experiments had the following characteristics:
− First and third experiment (Carleton 1 and Carleton 2): participants are fourth-year students
from a specialized, software engineering program. They were well versed in Java and UML
and were attending a course on software testing that covers different white-box and black-box
testing techniques with a focus on object-oriented testing. The experiments were conducted
during the lab hours of that course as part of practical lab exercises. 48 students participated
to the first experiment, and 19 to the second.
− Second and fourth experiment (Sannio 1 and Sannio 2): participants are graduate students
attending a Master in software technology. Master participants (about 30 every year) are
selected from a population of 300 graduate students in computer science and computer
engineering. The participants were students attending an intensive course on software testing.
Their experience with software testing before attending the course varied from no experience
to some experience with JUnit. 25 students participated to the second experiment, and 19 to
the fourth one.
The method for the selection of participants follows a stratified random sampling3;
participants were first assigned to blocks based, for Carleton Experiments, on their background
and knowledge of object-oriented design and development techniques4, and for Sannio
experiments on laurea graduation score (since participants were graduate students). Then,
3 Stratified random sampling: The population is divided into a number of groups or strata with a known distribution
between the groups. Random sampling is then applied within the strata [55].
4 The background and knowledge of object-oriented design and development techniques of the different subjects was
measured based on their grades in two advanced courses in software engineering and object-oriented design.
Page 15
Carleton University, TR SCE-08-09
15
participants were randomly selected from the different blocks to form four groups with a similar
distribution to ensure the results would not be affected by random variations in participant
experience across groups. In addition, groups were defined to be of similar sizes to ensure a
balanced contribution of test techniques/clusters combinations to the results. However, they were
practical constraints regarding the availability of certain participants and this limited the
randomization of selection. In spite of this issue, we managed to ensure that we had comparable
block distributions across groups where each block is represented by a similar number of
participants in every group.
3.2.2 Research questions
In this section we provide a detailed description of the research questions to be addressed
by the experiments (Table 2) to address the objectives listed in section 3.1.
Number Research Question
RQ1
What is the difference, in terms of fault detection effectiveness, between test cases
generated from state machines (Ts) and test cases generated only based on node and edge
coverage of the source code control flow (Tc)?
RQ2
What is the difference, in terms of code coverage, between test cases generated from state
machines (Ts) and test cases generated only based on node and edge coverage of the source
code control flow (Tc)?
RQ3
Are there interaction effects regarding fault detection effectiveness between code coverage,
learning effects, participant ability and software properties (code, state machine properties)
and the test technique applied?
RQ4
Are state machine-based testing and structural testing complementary in terms of fault
detection?
RQ5
Is there an interaction effect between code characteristics of class clusters and test
technique on the percentage of faults detected when combining Tc and Ts?
RQ6
Are there specific fault types that are more likely to be detected by Ts or Tc and for which
the combination of both sets of test cases is particularly effective?
RQ7 How does the cost between state machine-based testing and structural testing compare?
RQ8
How does the cost-effectiveness between state machine-based testing and structural testing
compare?
RQ9
Based on an analysis of the faults not detected by Ts, what can be added to the state
machine model to help generate test cases that target those types of faults?
Table 2: Research questions
View other sources
Hide other sources
-
Available from Massimiliano Penta · 26 Nov 2012
-
Available from 61.33