ArticlePDF Available

Abstract and Figures

In software engineering, error impact analysis consists in predicting the software elements (e.g. Modules, classes, methods) potentially impacted by a change. Impact analysis is required to optimize the testing effort. In this paper we present a new protocol to analyze the accuracy of impact analysis. This protocol uses mutation testing to simulate changes that introduce errors. To this end, we introduce a variant of call graphs we name the 'use graph' of a software which may be computed efficiently. We apply this protocol to two open-source projects and correctly predict the impact of 30% to 49% of changes.
Content may be subject to copyright.
An Experimental Protocol for Analyzing the Accuracy
of Software Error Impact Analysis
Vincenzo Musco, Martin Monperrus, Philippe Preux
To cite this version:
Vincenzo Musco, Martin Monperrus, Philippe Preux. An Experimental Protocol for Analyzing
the Accuracy of Software Error Impact Analysis. Tenth IEEE/ACM International Workshop
on Automation of Software Test, May 2015, Florence, Italy. <hal-01120913>
HAL Id: hal-01120913
https://hal.inria.fr/hal-01120913
Submitted on 27 Feb 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destin´ee au d´epˆot et `a la diffusion de documents
scientifiques de niveau recherche, publi´es ou non,
´emanant des ´etablissements d’enseignement et de
recherche fran¸cais ou ´etrangers, des laboratoires
publics ou priv´es.
An Experimental Protocol for Analyzing the
Accuracy of Software Error Impact Analysis
Vincenzo Musco, Martin Monperrus, Philippe Preux
University of Lille, CRIStAL and INRIA
Email: vincenzo.musco@inria.fr
Email: martin.monperrus@univ-lille1.fr
Email: philippe.preux@univ-lille3.fr
Abstract—In software engineering, error impact analysis con-
sists in predicting the software elements (e.g. modules, classes,
methods) potentially impacted by a change. Impact analysis is
required to optimize the testing effort. In this paper we present
a new protocol to analyze the accuracy of impact analysis. This
protocol uses mutation testing to simulate changes that introduce
errors. To this end, we introduce a variant of call graphs we name
the “use graph” of a software which may be computed efficiently.
We apply this protocol to two open-source projects and correctly
predict the impact of 30% to 49% of changes.
I. INTRODUCTION
Software continuously evolves by changes affecting some
component, such as a module, a classes, a function. A single
change to a source code can impact the entire software
package, and may break many parts beyond the changed
element. E.g. a change in a file reading function can impact
any class that uses it to read files. The literature discusses
techniques to reason on the impact of a given change (see e.g.
[1]).
As presented by Law and Rothermel [1], error impact
analysis can be done with either static call graphs or dy-
namic traces. These authors claim that static call graphs are
inaccurate because they may return excessively large sets of
impacted elements. However, we have no clear evidence that
this claim has been challenged empirically. This is the problem
we address in this paper: we propose an experimental protocol
to numerically assess static error impact analysis.
We present an experimental protocol to measure the accu-
racy of impact analysis. It is inspired from mutation testing.
We consider a software equipped with a set of tests, and we
introduce changes in the software. Running the software, some
tests fail; we consider the set of such failing tests as being the
ground truth that we have to be able to predict. To make this
prediction, we consider a specific kind of dependency graph
that we call the “use graph” of the software under study. Use
graphs are simple static call graphs which also include field
usages. Intuitively, if a node uses another faulty node, the
error propagates and itself becomes faulty. The output of the
protocol is the number of correctly predicted impacted nodes,
as well as the number of false positives and false negatives.
These figures let us estimate the accuracy of our procedure.
We use this protocol on two mainstream open-source soft-
ware packages: Apache Commons Lang and Apache Com-
mons Collections. We compare the predictions made by our
protocol with the predictions made by classical test suites. We
correctly predict the impact of software mutation for up to
49% of the injected errors.
To sum up, our contributions are:
an algorithm to numerically analyze the accuracy of error
impact analysis;
a visualization of error impact;
a study of error impact on two large open-source software
packages totaling 100k+ lines of code;
three explanations on the weaknesses of basic use graphs
to correctly estimate error propagation: 1) no inclusion of
calls made using Java reflexion; 2) bad boundary between
tests and application functions; 3) incomplete use graph
building process.
The remainder of this paper is structured as follows. Section
II defines our protocol, Section III presents our experiments
and results, Section IV discusses the related work and Section
V concludes this paper.
II. METHODOLOGY
A. Protocol
In this paper, we want to perform static error impact
analysis: considering a software change, we want to determine
its impact on the entire software application, i.e. determine the
parts, other than the one containing the error, to which the
error propagates. To this end, we introduce a new graph that
represents interactions between software elements; we name
this graph the “use graph” of a software. Let us denote this
graph by G.Gis directed; a node represents either a method,
or a field. There is an edge between a method node m and a
method m’ if m calls m’ (the call may be recursive); there is
an edge between a method node and a field node if this field
is used in the method. In a use graph, field nodes are thus
sinks.
We investigate how accurately we may predict the propa-
gation of changes in a software using its use graph. Given
the use graph of a software, we compute the set of nodes J
which may be impacted by this change. To do so, we compute
the set of nodes that may be reached from the changed node
by following the use graph edges in the reverse direction. We
use the technique known as “mutation injection” to simulate
changes. A mutation injection is a source code modification
that is made on purpose to track its consequences.
Algorithm 1: Compute the accuracy of impact analysis
based on mutation.
Input:Sthe software package. mop the mutation
operator.
Output:I,I+,Iand Io(as described in Section II-A)
for each eand m.
1begin
2GuseGraph(S);
3TtestCases(S);
4for each ein filterElements(S, mop)do
5for each min mutants(S, e, mop)do
6if mcompiles then
7JimpactedTests(m, G);
8FfailingTests(m, T );
9if J=Fthen
10 II∪ {m};
11 else if FJthen
12 I+I+∪ {m};
13 else if JFthen
14 II∪ {m};
15 else
16 IoIo∪ {m};
Because the use graph provides a static view of the software,
the use graph only approximates the change propagation graph.
Since it is static, it includes all possible connections, some of
which may be ignored using a dynamic analysis. For instance,
if a change propagates only if x= 0, this is not always known
statically. The use graph is similar to the dependency graph
extracted by the widely used tool Dependency Finder 1at the
“feature” level (methods and fields).
To evaluate the accuracy of our approach, we define and
compute the following sets:
(i) the set Icomposed of changes that return the same set
of tests no matter what technique is used;
(ii) the set I+composed of changes that return some extra
tests using the use graph;
(iii) the set Icomposed of changes that return some extra
tests using the test suite execution;
(iv) the set Iocomposed of changes that return extra tests on
both.
Algorithm 1 shows our approach. This algorithm inputs
a software package to consider for impact analysis and the
mutation operation that is responsible for mutation injection.
The output is the subsets I,I+,Iand Iodescribed above. In
line 2, we compute the use graph of the software under study.
In line 3, we get the set of test cases from the input software S
(testCases). In lines 4–6, we select (filterElements),
mutate (mutants) and test the appropriate elements in the
software. Appropriate elements are syntactic entities to which
the specific change can be applied. In line 7, we determine
1http://depfind.sourceforge.net/
Incorrectly predicted testsCorrectly predicted tests
Application node
Introduced error Missed tests
Fig. 1. This figure illustrates the effect of a particular mutation in the
Apache.Commons.lang project. Only the interesting part of the use graph is
represented here; the use graph is much larger, made of 5406 nodes. Nodes in
black are the nodes that propagate the mutation injected in the node denoted
with a red cross. Blue, yellow, and purple nodes are tests related to the injected
mutation. Blue nodes are tests that are correctly predicted as impacted by the
injected mutation; these are true positives. Yellow nodes are tests that are
predicted as impacted, but are not; these are false positives. Purple nodes are
tests that should have been predicted as being impacted but have not been;
these are false negatives.
the nodes impacted by the mutation (impactedTests)
according to G. In Line 8, the function failingTests
returns the set of tests that fail when running the mutated
version of the software. In lines 9–16, we determine, for each
mutant, if more tests are determined with the use graph (line
12), by test suite execution (line 14), by both (line 16) or if
they are the same (line 10).
B. Analysis Metrics
In this section we present the metrics we use in this paper.
Considering all changes for a specific project and a mutation
operator, we can compute |I|,|I+|,|I|,|Io|as the size of the
respective sets I,I+,I,Io. It is useful to normalise these
numbers, to get a feeling of the accuracy of the approach based
on the use graph. So we provide the ratio between these sizes
and the number of changes. |I|+|I+|is the sum of the size of
the two sets Iand I+.1and 2are respectively the median
number of extra tests returned by use graph and missing tests
returned by test suite execution for all changes.
TABLE I
STATISTICS ABOUT PROJECTS CONSIDERED IN THIS PAPER AND THEIR USE GRAPHS.
Project Version LOC # of Nodes # of Edges Time # of Tests Time per test
Apache Commons Lang 3.1 52841 5406 10418 8.0s 2015 7ms
Apache Commons Collections 4.1 55081 6680 13478 9.7s 367 49ms
C. Visualization
We propose a visualization of error propagation: it provides
developers with an idea of the potential dynamics and the
complexity of the impact of a software mutation. Figure 1
illustrates an error-introducing change in Apache Commons
Lang. Each node represents either a method or a field, and
each edge represents a call to a method or an access to a
field. The red cross is the node where the mutation occurs.
Purple stars are missed tests (i.e. detected only by the test suite
execution), yellow boxes are incorrectly predicted tests (i.e.
detected only by the use graph), blue diamonds are correctly
predicted tests (i.e. found by both techniques) and black circles
are application nodes. As an example, the graph illustrated on
Figure 1 is composed of 55 nodes with 6 correctly predicted,
7 missed tests, 23 incorrectly predicted and 18 applications
nodes. As there are missed and incorrectly predicted tests, this
error-introducing change belongs to the Ioset.
Those nodes categories are explained in details in section
II-A. As we can see on the example, we can notice the mul-
tiple propagation paths which exist from the error-introducing
change.
D. Research Questions
In this section we present the research questions to which
we want to answer in this paper.
Research Question 1 Is a basic use graph a good candidate
for error impact analysis? Since the use graph of a software
is rather cheap and fast to compute, we want to know whether
it can be reliably used as a preliminary impact assessment,
before using more computationally intensive techniques. A use
graph allows one to prioritize test cases to execute when a
change occurs; a use graph can also be used to locate all
methods that have to be reviewed after a change happened in
the software.
Research Question 2 Is error impact analysis project-
dependent or change-dependent? Answering to this question
allows us to determine the level of genericity of use graph.
Research Question 3 How rich are test suite scenarios?
In our context, the richness of tests relate to the number
of program elements (say methods) involved in testing. The
answer would help us to determine the directions we should
take for our future investigations.
Research Question 4 What are the reasons of a bad accu-
racy of impact analysis based on use graphs as defined in this
paper? To answer this question, we manually investigate some
cases where there is no perfect match to determine the reasons
leading to bad accuracy.
III. EXP ER IME NTAL RES U LTS
A. Setup
In this paper, we consider a dataset composed of two Java
software packages: Apache Commons Lang 3.1 and Apache
Commons Collections 4.1. Table I reports the key descriptive
statistics about these projects:
the first and second columns give the name and the
version of the software being analysed;
the third column indicates the number of lines of code of
the project (computed using cloc 2);
the fourth and fifth columns give the number of nodes
and the number of edges of the use graph;
the sixth column gives the time required to build the use
graph;
the seventh column gives the number of test cases com-
prised in the project;
the eighth column indicates the average time for a single
test execution3.
Our technique requires mutation operators, in this exper-
iment, we choose the following two: (i) if-logic mutation
that consists in changing logical operators in an if test (==,
!=,<,>,<= and >=); (ii) return-value mutation consists
in changing a return statement where the return value is
changed according to its type (e.g. a non null integer returns
a zero value).
We implement a program to extract the use graph using
Spoon4, an open-source library for analyzing and transforming
Java source code.
B. Research Questions
Table II presents the accuracy obtained with the use graph
for error impact analysis. The first and second columns give
the project name and the mutation operator considered, the
third and fourth column are the median and the maximum
2http://cloc.sourceforge.net/
3time to execute anything obviously depends on many things. Timings are
reported here to give the reader a feeling about them. All experiments were
made on a HP EliteBook 8570w Mobile Workstation, i7-3740QM quad core,
2.7Ghz, under Xubuntu.
4http://spoon.gforge.inria.fr/
value of the number of impacted nodes. The remaining
columns are described in section II-B.
Research Question 1 Is a basic use graph a good candidate
for error impact analysis?
Based on the limited experimental evidences presented here,
the answer to this question seems affirmative. As we can see on
table II, in 30 to 50% of cases, use graph can obtain a perfect
match (I), which means the test set retrieved by use graph is
exactly the same as the one returned by test suite execution.
We consider this as a good result given that the technique is
very light: as reported in Table I, the use graph construction
takes no more than 10 seconds to be generated (last column).
If we consider completeness (all impacted tests are predicted),
Apache Commons Lang gives more encouraging values: 91%
of cases returned by use graph includes all failing tests. Last,
we can notice the median number of extra or missing tests
are around 3.5 to 14 for Apache Commons Lang and 2 to
4 for Apache Commons Collections which represents around
1% to 7% of total cases. According to the average execution
time of a test (see Table I) , the cost in time to execute those
erroneously predicted tests is much less than 1 second (around
21-196ms). Again, to us, the lack of precision is small for such
a computationally light technique. Still, we must nuance our
answer as the sets of false positives (I+) and false negatives
(I) can not be neglected.
Research Question 2 Is error impact analysis project-
dependent or change-dependent?
As reported in Table II: (i) the values differ strongly
from a project to another for a same mutation operator (e.g.
considering the if-logic mutation operator, 45.3% in I+for
Apache Commons Lang versus 1.3% for Apache Commons
Collections); (ii) the values differ less from a mutation operator
to another for a same project (e.g. considering the Apache
Commons Lang project, 45.3% in I+for if-logic mutation
operator versus 42.0% for the return-value).
This indicates that the use graph is more project dependent
than error-family dependent. Clearly, a stronger answer to this
question requires more investigation as the dataset and the
number of operators are small.
Research Question 3 How rich are test suite scenarios?
Figure 2 plots the distribution of the number of impacted
nodes. Most graphs have a size smaller or equal to 10.
However, there exist complex test scenarios, with much more
than the median value of 6 impacted nodes; one impacts 344
nodes. If we look again at the example on Figure 1, the size
#nodes
Occurences
0 50 100 150 200 250 300 350
0 100 200 300 400
Fig. 2. Distribution of the number of impacted nodes.
of the graph is 55 nodes, and the impact propagation is not
straightforward.
Research Question 4 What are the reasons of a bad accu-
racy of impact analysis based on use graphs as defined in this
paper?
To investigate this point, we pick at random three error-
introducing changes in sets I+, in Iand Io.
For the error-introducing change in set I+we take the case
in Apache Commons Lang with a if-logic mutation. It shows
an important use of Java Reflection mechanism. We are not
able to handle these cases with the use graph as reflection is
resolved at runtime. Nevertheless, even if the problem would
not be handled, we can easily detect its uses as it refers to some
specific classes/packages in Java. A warning can be raised to
the user about the use of reflection mechanism so that special
care could be taken when dealing with those methods.
For the error-introducing change in set Iwe analyze cases
in Apache Commons Collections with a return-value mutation.
We notice that the mutation occurs in an abstract test in which
the call to the error-introducing change node occurs. Mutations
are not expected to be done in test code, those cases highlight
the difficulty to determine the boundary of test cases in a
software, which turns out to be harder than expected.
For the error-introducing change in set Iowe take a
case in Apache Commons Lang with a return-value muta-
tion. This case suffers from another problem: the use of
the Java notation for arbitrary number of parameters (i.e.
method(Object...)). This is not correctly resolved when
called with something such: method(obj_array) or even
method(obj1, obj2) where obj_array is an array
of Objects,obj1 and obj2 are Objects. A specific
processing in the construction of the graph is required to
handle this special case.
TABLE II
THE P ER FO RM AN CE O F IM PACT AN ALYS IS B AS ED O N LI GH TW EI GH T CA LL G RA PH S ON T WO JAVA SOFT WARE PAC KAG ES .IVALU ES A RE P RO PO RTI ON S
OF T HE 600 M UTAN TS G EN ER ATED F OR E AC H.
Graph size
Project Mutation op. median maximum |I|(%) |I+|(%) ∆1|I|+|I+|(%) |I|(%) ∆2|Io|(%)
Lang if-logic 6 344 35.8% 45.3% 3.5 81.2% 12.0% 14 6.8%
return-value 3 344 49.3% 42.0% 14 91.3% 7.5% 3 1.2%
Collections if-logic 1 57 30.2% 1.3% 4 31.5% 64.2% 2 4.3%
return-value 1 47 34.5% 2.7% 2 37.2% 62.3% 2 0.5%
C. Discussion
At a conceptual level, the main threat to the validity of our
experimental results is that we consider the test suite execution
as ground truth. However, it may be the case that the test cases
miss the assertions that would detect the propagated error and
thus fail. This threat is mitigated by manual analysis.
At the experimental level, we have yet used only a very
limited set of software and of software changes. The experi-
mental investigation should be made on a much larger set of
cases in order to provide statistically significant conclusions
about the relevance of the use graph approach to study the
propagation of software changes.
IV. REL ATE D WOR KS
Challet and Lombardoni [2] proposed a theoretical reflexion
about impact analysis using graphs. However, they do not
evaluate the validity of their “bug basins” as we do in this
paper.
Law and Rothermel [1] proposed an approach for impact
analysis; their technique is based on a code instrumentation to
analyze execution stack traces. They compare their technique
against simple call-graphs on one small software subject. On
the contrary we apply our technique to two large software
applications.
Robillard and Murphy [3] introduced “concern graphs” for
reasoning on the implementation of features. This kind of
graphs may be assessed with the protocol we have presented
here.
Michael and Jones [4] alter variables during the program’s
execution in order to study the perturbations of the software.
They focus on data-state perturbation, the if-logic mutation is
another kind of perturbation: “control perturbation”.
Murphy et al. [5] studied nine tools for static call-graph
extraction. They stated that each of them has particularities in
how each part is treated by the tool. Use graphs are a specific
type of call graphs. A key difference is that Murphy et al. do
not investigate error propagation.
V. CONCLUSION
In this paper, we aimed at assessing experimentally the
accuracy of static error impact analysis. We presented a new
protocol for this based on mutation testing. We discussed a
set of preliminary experimental results on impact analysis in
two open-source software applications. Our experiments show
that use graphs enable a perfect impact prediction for 30%—
49% of simulated changes. However, more experiments are
required to confirm this finding. Future work will improve the
use graphs to perform better impact analysis.
REFERENCES
[1] J. Law and G. Rothermel, “Whole Program Path-Based Dynamic Im-
pact Analysis,” in Proceedings of the 25th International Conference on
Software Engineering, May 2003, pp. 308–318.
[2] D. Challet and A. Lombardoni, “Bug Propagation and Debugging in
Asymmetric Software Structures,” Physical Review E, vol. 70, p. 046109,
Oct 2004.
[3] M. P. Robillard and G. C. Murphy, “Concern Graphs: Finding and
Describing Concerns Using Structural Program Dependencies,” in Pro-
ceedings of the 24th International Conference on Software Engineering.
New York, NY, USA: ACM, 2002, pp. 406–416.
[4] C. C. Michael and R. C. Jones, “On the Uniformity of Error Propagation
in Software,” in Proceedings of the 12th Annual Conference on Computer
Assurance, June 1997, pp. 68–76.
[5] G. C. Murphy, D. Notkin, W. G. Griswold, and E. S. Lan, “An Empirical
Study of Static Call Graph Extractors,” ACM Transactions on Software
Engineering and Methodology, vol. 7, no. 2, pp. 158–191, Apr 1998.
Article
As the complexity of modern software systems increases, changes in software have become crucial to the software lifecycle. For this reason, it is an important issue for software developers to analyze the changes that occur in the software and to prevent the changes from causing errors in the software. In this paper, mutation testing as software test analysis is examined. Mutation tests have been implemented on open-source Java projects. In addition, for aviation projects, emphasis is placed on performing change impact analysis processes in compliance with the certification based on DO-178C processes.
Article
Full-text available
Informally, a call graph represents calls between entities in a given program. The call graphs that compilers compute to determine the applicability of an optimization must typically be conservative: a call may be omitted only if it can never occur in any execution of the program. Numerous software engineering tools also extract call graphs with the expectation that they will help software engineers increase their understanding of a program. The requirements placed on software engineering tools that compute call graphs are typically more relaxed than for compilers. For example, some false negatives—calls that can in fact take place in some execution of the program, but which are omitted from the call graph—may be acceptable, depending on the understanding task at hand. In this article, we empirically show a consequence of this spectrum of requirements by comparing the C call graphs extracted from three software systems (mapmaker, mosaic, and gcc) by nine tools (cflow, cawk, CIA, Field, GCT, Imagix, LSME, Mawk, and Rigiparse). A quantitative analysis of the call graphs extracted for each system shows considerable variation, a result that is counterintuitive to many experienced software engineers. A qualitative analysis of these results reveals a number of reasons for this variation: differing treatments of macros, function pointers, input formats, etc. The fundamental problem is not that variances among the graphs extracted by different tools exist, but that software engineers have little sense of the dimensions of approximation in any particular call graph. In this article, we describe and discuss the study, sketch a design space for static call graph extractors, and discuss the impact of our study on practitioners, tool developers, and researchers. Although this article considers only one kind of information, call graphs, many of the observations also apply to static extractors of other kinds of information, such as inheritance structures, file dependences, and references to global variables.
Article
Full-text available
We address the issue of how software components are affected by the failure of one of them, and the inverse problem of locating the faulty component. Because of the functional form of the incoming link distribution of software dependence network, software is fragile with respect to the failure of a random single component. Locating a faulty component is easy if the failure only affects its nearest neighbors, while it is hard if it propagates further.
Conference Paper
Full-text available
Impact analysis, determining when a change in one part of a program affects other parts of the program, is time-consuming and problematic. Impact analysis is rarely used to predict the effects of a change, leaving maintainers to deal with consequences rather than working to a plan. Previous approaches to impact analysis involving analysis of call graphs, and static and dynamic slicing, exhibit several tradeoffs involving computational expense, precision, and safety, require access to source code, and require a relatively large amount of effort to re-apply as software evolves. This paper presents a new technique for impact analysis based on whole path profiling, that provides a different set of cost-benefits tradeoffs - a set which can potentially be beneficial for an important class of predictive impact analysis tasks. The paper presents the results of experiments that show that the technique can predict impact sets that are more accurate than those computed by call graph analysis, and more precise (relative to the behavior expressed in a program's profile) than those computed by static slicing.
Conference Paper
Full-text available
Informally, a call graph represents calls between entities in a given program. The call graphs that compilers compute to determine the applicability of an optimization must typically be conservative: a call may be omitted only if it can never occur an any execution of the program. Numerous software engineering tools also extract call graphs, with the expectation that they will help software engineers increase their understanding of a program. The requirements placed on software engineering tools when computing call graphs are typically more related than for compilers. For example, some false negatives-calls that can in fact take place in some execution of the program, but which are omitted from the call graph-may be acceptable, depending on the understanding task at hand. In this paper we empirically show a consequence of this spectrum of requirements by comparing the C call graphs extracted from three software systems (mapmaker, mosaic, and gee) by five extraction tools (cflow, CIA, Field, mk-functmap, and rigiparse). A quantitative analysis of the call graphs extracted for each system shows considerable variation, a result that is counterintuitive to many experienced software engineers. A qualitative analysis of these results reveals a number of reasons for this variation: differing treatments of macros, function pointers, input formats, etc. We describe and discuss the study, sketch the design space, and discuss the impact of our study on practitioners, tool developers, and researchers
Conference Paper
Many maintenance tasks address concerns, or features, that are not well modularized in the source code comprising a system. Existing approaches available to help software developers locate and manage scattered concerns use a representation based on lines of source code, complicating the analysis of the concerns. In this paper, we introduce the Concern Graph representation that abstracts the implementation details of a concern and makes explicit the relationships between different parts of the concern. The abstraction used in a Concern Graph has been designed to allow an obvious and inexpensive mapping back to the corresponding source code. To investigate the practical tradeoffs related to this approach, we have built the Feature Exploration and Analysis tool (FEAT) that allows a developer to manipulate a concern representation extracted from a Java system, and to analyze the relationships of that concern to the code base. We have used this tool to find and describe concerns related to software change tasks. We have performed case studies to evaluate the feasibility, usability, and scalability of the approach. Our results indicate that Concern Graphs can be used to document a concern for change, that developers unfamiliar with Concern Graphs can use them effectively, and that the underlying technology scales to industrial-sized programs.
Conference Paper
Many maintenance tasks address concerns, or features, that are not well modularized in the source code comprising a system. Existing approaches available to help software developers locate and manage scattered concerns use a representation based on lines of source code, complicating the analysis of the concerns. In this paper, we introduce the concern graph representation that abstracts the implementation details of a concern and makes explicit the relationships between different parts of the concern. The abstraction used in a Concern Graph has been designed to allow an obvious and inexpensive mapping back to the corresponding source code. To investigate the practical tradeoffs related to this approach, we have built the feature exploration and analysis tool (FEAT) that allows a developer to manipulate a concern representation extracted from a Java system, and to analyze the relationships of that concern to the code base. We have used this tool to find and describe concerns related to software change tasks. We have performed case studies to evaluate the feasibility, usability, and scalability of the approach. Our results indicate that concern graphs can be used to document a concern for change, that developers unfamiliar with concern graphs can use them effectively, and that the underlying technology scales to industrial-sized programs.
Conference Paper
This paper presents an empirical study of an important aspect of software defect behavior: the propagation of data-state errors. A data-state error occurs when a fault is executed and affects a program's data-state, and it is said to propagate if it affects the outcome of the execution. Our results show that data-state errors appear to have a property that is quite useful when simulating faulty code: for a given input, it appears that either all data state errors injected at a given location tend to propagate to the output, or else none of them do. These results are interesting because of what they indicate about the behavior of data-state errors in software. They suggest that data state errors behave in an orderly way, and that the behavior of software may not be as unpredictable as it could theoretically be. Additionally, if all faults behave the same for a given input and a given location, then one can use simulation to get a good picture of how faults behave, regardless of whether the faults one has simulated are representative of real faults