Content uploaded by Michael Felderer
Author content
All content in this area was uploaded by Michael Felderer on Oct 28, 2016
Content may be subject to copyright.
Requirements for Integrating
Defect Prediction and Risk-based Testing
Rudolf Ramler
Software Competence Center Hagenberg
Softwarepark 21, A-4232 Hagenberg, Austria
rudolf.ramler@scch.at
Michael Felderer
University of Innsbruck
Technikerstrasse 21a, A-6020 Innsbruck, Austria
michael.felderer@uibk.ac.at
Abstract—Defect prediction is a powerful method that pro-
vides information about the likely defective parts in software
system and is applicable to improve effectiveness and efficiency
of software quality assurance. This makes defect prediction a
perfect candidate to be combined with risk-based testing to opti-
mally guide testing activities towards risky parts of software. As
a first step towards a successful combination, this paper presents
requirements that have to be fulfilled for enabling the synergies
between defect prediction and risk-based testing.
Keywords—risk-based testing; defect prediction; software test-
ing; software test process.
I. INTRODUCTION
The prediction of defects in software systems has become a
frequently and widely addressed topic in research. The driving
scenario is usually summarized as follows: “Software testing
activities play a critical role in the production of dependable
systems, and account for a significant amount of resources in-
cluding time, money, and personnel. If testing can be more
precisely focused on the places in a software system where
faults are likely to be, then available resources will be used
more effectively and efficiently, resulting in more reliable sys-
tems produced at decreased cost.” [11]
The rationale of risk-based testing is along the same lines.
Increasing effectiveness and efficiency (by focusing testing on
the “most critical” parts of the software system under test) is
the main motivation for adopting a risk-based approach in
software testing. As reported by a previous study on risk-based
testing in industry, “risk information is used in testing in two
ways: (1) As a suggestion to extend the scope of testing to-
wards ‘risky’ areas where critical problems can be found, and
(2) as a guideline to optimally adjust the focus of testing to
‘risky’ areas where most critical problems are located” [2].
The shared objectives and the congruent underlying ra-
tionale of both approaches suggest that defect prediction and
risk-based testing are promising candidates for a combination
where the whole may be greater than the sum of its parts. De-
fect prediction is a method that provides information about the
likely defective components in a future version of a software
system. Risk-based testing uses risk information for guiding
the activities in the software test process. The goal of this paper
is to outline how the information provided by defect prediction
can be leveraged in risk-based testing.
The reminder of the paper is structured as follows. Section
II provides an overview of defect prediction and shows exam-
ples of successful applications of defect prediction in practice.
Section III explains the concepts of risk-based testing including
open issues to be addressed by the integration of defect predic-
tion. Section IV describes how defect prediction and risk-based
testing can be combined. The section also discusses require-
ments that have to be fulfilled by defect prediction in order to
enable exploiting synergies. Section V concludes with an out-
look on our plans for future work.
II. DEFECT PREDICTION
Defect prediction has been proposed as a way to indicate
the likely defective parts (e.g., modules, components, files, or
classes) in a future version of a software system [9][10][11].
These likely defective parts are predicted based on software
metrics, which provide quantitative descriptions of software
modules. The metrics are, for example, derived from static
properties of the source code of the analyzed system (e.g.,
[10]), or they capture various aspects of the project’s develop-
ment process and change history (e.g., [9]). Supervised learn-
ing techniques (classification trees, neural networks, etc.) are
commonly used to construct prediction models from these
software metrics [7]. The resulting prediction models define
the relationship between the software metrics as independent
variables and the modules’ defectiveness in terms of defect
count or binary classification as dependent variable.
Numerous studies have shown that a relationship can be es-
tablished between some software metrics and the defectiveness
of the modules of a software system and, that this relationship
can be exploited for making predictions. For example, 208
such studies have been analyzed in a systematic literature re-
view by Hall et al. [2]. The objective of most of these studies
has been to explore and investigate specific aspects relevant for
constructing defect prediction models in order to refine and
advance defect prediction techniques. Empirical methods in-
cluding data and artifacts from real-world projects are used for
evaluating the results. However, only a few studies report on
the application of defect prediction in practice and share in-
sights in how the prediction results have been used to support
software testing activities.
The following cases are examples of empirical studies that
provide evidence about the use of defect prediction to support
software testing activities in an industrial context. The common
goals were to increase the efficiency and effectiveness of soft-
ware testing. However, the cases also document that defect
prediction and the results it generates have to be integrated in
the wider context of the test process in order to become usable
and useful for practitioners. Predicting the likelihood of defects
is more of a means to an end rather than an end in itself.
Ostrand et al. [11] describe the application of defect predic-
tion for a large software system developed by AT&T. They
analyzed a total of seventeen successive releases related to
more than four years of continuous field use. The authors used
defect prediction to support testers by providing information on
where in their software releases to test prior to testing them.
Their goal was to provide testers with a practical and reasona-
bly accurate measure of which files are most likely to contain
the largest number of faults, so the testers can direct their ef-
forts to these fault-prone files. The prediction approach they
used assigned a predicted fault count to each file of a software
release, based on the file’s structure and its history over the
previous releases. The prediction results were used to rank the
files from most to least fault-prone, without choosing an arbi-
trary cutoff point for fault-proneness. The ranking allows test-
ers to focus their efforts and by so doing, to find defects more
quickly and to conduct testing more efficiency.
Li et al. [8] report experiences and initial empirical results
from using defect prediction for initiating risk management
activities at ABB. The authors examined data from two real-
time commercial software systems, a monitoring system and a
controller management system, by analyzing releases spanning
about 5 years and 9 years of development. The results were
used to improve testing, to support maintenance planning, and
to initiate process improvement. Regarding testing, their goal
was to increase the effectiveness of testing in order to detect
(and remove) defects that customers may encounter. Prediction
had been used to determine the fault proneness of different
areas, i.e., application groupings (sub-systems) and operating
systems groupings, to prioritize product testing. Making pre-
dictions for identifying defect-prone sub-systems was not in-
tended to replace expert knowledge. The results have been
used to complement expert intuition by providing quantitative
evidence. It allowed test engineers to back their decisions and
recommendation with quantitative data. When applied in test-
ing, the prediction results helped to uncover additional defects
in a sub-system previously thought to be low-defect.
Taipale et al. [17] present a pilot study where they devel-
oped a defect prediction model and explored different ways of
making the prediction results usable for the practitioners, e.g.,
via a commit hotness ranking and the visualization of interac-
tions among teams through errors. The project that has been
used in the study was a software component of a mission-
critical embedded system project with about 60 developers
working on it. The research was initiated from the perspective
of defect prediction, and then extended into finding ways of
presenting the data collected throughout the process and the
outcomes of the predictions for use of the developers. The au-
thors were able to construct prediction models with good per-
formance, in the range of related studies. The constructed mod-
els provided accurate information about the most error prone
parts of the software. However, the feedback from the practi-
tioners showed that these results alone are not useful to have an
impact in their daily work. Additional effort had been neces-
sary for creating practical representations from the prediction
results in order to become of value for the project.
III. RISK-BASED TESTING
Risk-based testing (RBT) is a testing approach which con-
siders risks of the software product as the guiding factor to
support decisions in all phases of the test process [4]. In this
section we present the concept of risk in software testing as
well as a process for risk-based testing, which implements the
current software testing standard ISO/IEC/IEEE 29119. This
standard explicitly involves risks as an integral part of the test-
ing process but lacks concrete implementation guidelines.
A. Concept of Risk in Software Testing
A risk is a factor that could result in future negative conse-
quences and is usually expressed by its probability and impact
[6]. In context of testing, probability is typically determined by
the likelihood that a failure assigned to a risk occurs, and im-
pact is determined by the cost or severity of a failure if it oc-
curs in operation. The resulting risk value or risk exposure is
assigned to a risk item. A risk item is anything of value (i.e., an
asset) associated with the system under test, for instance, a
requirement, a feature, or a component. Risk items can be pri-
oritized based on their risk exposure values and assigned to
risk levels. Risk levels are often defined in form of a risk ma-
trix combining probability and impact values. Figure 1 shows a
3×3 risk matrix visualizing the risks in terms of the estimated
probability and impact associated to four risk items (A to D).
The fields of the matrix correspond to five risk levels, i.e., level
I (low probability and low impact) to level V (high probability
and high impact). However, even though the risk items B and C
are at the same risk level (level III), risk item B with high im-
pact and low probability may be considered more critical and
may therefore be treated with a different testing approach than
risk item C, which has a high probability and low impact.
III IV V
II III IV
III III
Estimated Impact
Estimated Probability
medium
high
low
low
medium
high
A
B
C
D
Fig. 1. Example 3x3 risk matrix defining five risk levels (I-V)
and showing four risk items (A-D).
B. Process for Risk-based Testing
The risk matrix is an abstraction layer. It decouples collect-
ing and computing risk information (i.e., raw probability and
impact values) from operational testing activities concerned
with concrete test scenarios and test cases. The risk infor-
mation compiled in form of the risk matrix is used as basis for
many of the consecutive decisions in the testing process and,
eventually, for driving testing activities to generate successful
results including revealing new defects and – beyond that –
creating valuable insights for decision makers. The information
in the risk matrix can be adopted in the test process to support
decisions in all of its phases, i.e., test planning, test design, test
implementation, test execution and test evaluation [6]. Figure 2
shows the central role of the risk matrix in the risk-based test-
ing process. It links probability and impact estimates to the
phases of the test process. Since the risk information is provid-
ed for individual risk items, each item can be treated according
to its risk level by selecting appropriate testing measures and
adjusting the testing intensity.
Test Planning
Test Design
Test Implementation
Test Execution
Test Evaluation
Impact Estimation
Probability Estimation
Risk Matrix
Software
Metrics
Defect
Information
III VI V
II III VI
I II III
Fig. 2. Risk matrix and risk-based testing process.
C. Probability Estimation
Probability values are estimated for each risk item. In the
context of testing the probability value expresses the likelihood
of defectiveness of a risk item, i.e., the likelihood that a fault
exists in a specific module that may lead to a failure. In prac-
tice, probability estimation often relies on data from defect
classification and the software system’s defect history [14].
The estimation of the fault probability is usually performed in
an informal way based on experience and/or heuristics, e.g., the
number of bug reports in the past is applied as basic predictor
for the number of future faults. In [14] an approach to estimate
the risk probability of components by counting their assigned
defects weighted according to their severity is presented. A
recent study [13] on informal manual risk estimation based on
expert opinions highlights problems with regard to the timely
availability of experts, the proneness to estimation bias, as well
as the reliability of the results. Furthermore, the study shows
that it requires several experts working together, as in large
systems there is often no single person with sufficient
knowledge to estimate the probability for all parts. Finally,
manual methods are time consuming, which is particularly
noticeable when repeated estimation is required as in the case
of regression testing or continuous delivery.
To overcome these problems, manual estimation approach-
es should be complemented by including automatically meas-
urable metric data. For instance, [1] uses a Factor-Criteria-
Metrics approach to integrate manually and automatically
measured metrics for estimating the likelihood of defective-
ness. More concrete, in this approach the criteria code com-
plexity and functional complexity, which are measured auto-
matically via static analysis and manually via expert estima-
tion, respectively, are integrated to estimate the factor proba-
bility for the risk item type system components. The setup,
maintenance, and interpretation of such integrated approaches
soon become complex and costly. Enhancing them by estab-
lished and mature learning-based defect prediction approaches
can further improve efficiency and accuracy of probability es-
timation for testing purposes.
IV. COMBINING DEFECT PREDICTION AND
RISK-BASED TESTING
Defect prediction provides valuable information about the
likely defectiveness of the individual parts of a software sys-
tem. The predicted values are, for example, the number of de-
fects in a module or its classification as defective/defect-free.
In risk-based testing, this information can be used as basis for
estimating the risk probability associated with the system parts.
It expresses the likelihood that the software system will fail,
which is related to defectiveness (i.e., a defect occurs), even
though the nature of this relationship can be complex [12].
Risk-based testing provides a framework where the predic-
tion results can be integrated in a form that they become appli-
cable for human testers. Furthermore, testing activities incorpo-
rate several influence factors in prioritization and decision
making. The likelihood of defectiveness is one of these factors.
Although it is an important factor, it has to be combined with
other factors such as the impact of the defect for customers and
end users. In risk-based testing this combination is supported
by risk visualizations, incorporating the knowledge of the hu-
man testers, and interactive assignment of risk items to the risk
levels in a risk matrix.
To realize the benefits of defect prediction for risk-based
testing, an interface between the two processes has to be estab-
lished and the provided prediction information needs to be
brought in alignment with the information needs of the testing
process. The following requirements for making defect predic-
tions have been derived from our experience with the risk-
based testing process described in [2][3][14] as well as practi-
cal experiences with defect prediction in industry projects [15].
Prediction results must be associated with risk items. A
central task in any test process is eliciting the “big picture” of
the system under test and systematically identifying the indi-
vidual system parts and aspects that need to be tested. The re-
sult of this step in risk-based testing is the list of risk items.
The risk items are the entities the testers are familiar with (e.g.,
features, sub-systems, configurations). They are at the right
level of granularity to be handled in testing and for associating
them with the different risk factors. Thus, defect prediction has
to provide information that can be related to the risk items used
in a specific project, e.g., directly or by aggregation in case of a
more complex, hierarchical relationship.
Predictions must be made for different types of risk.
Testing needs to cover various functional aspects (e.g., correct-
ness, completeness, appropriateness) as well as non-functional
aspects (e.g., performance, security, usability) depending on
the system under test and its application domain. These differ-
ent aspects of software quality form a separate dimension that
has to be considered throughout all steps of the testing process
starting from defect reporting to selecting testing methods and
techniques. Adjusting the testing activities to these quality as-
pects is an important step in risk-based testing. Consequently,
defect prediction has to distinguish different risk types when
predicting the probability of defectiveness of an item (e.g.,
[18]). Prediction results should provide information to deter-
mine the overall risk associated with a quality aspect (e.g., to
decide how much load testing will be necessary in the upcom-
ing release) and how this type of risk relates to the different
parts of the system under test (e.g., to decide where to test).
Predictions must be about the future. In order to be of
practical use, predictions have to provide information about the
defectiveness before this information becomes available via
other sources like inspection or testing. Thus, for example,
predictions have to be made at the end of the development
phase to predict the expected number and location of defects to
support test management decisions or – due to the advent of
continuous delivery – predictions have to be continuously up-
dated and incorporated in making release decisions.
Prediction results must be compatible with risk levels.
The risk items under test are assigned to risk levels (Fig. 1),
which partition the spectrum of risk values and cluster risk
items with similar impact and probability estimates for testing.
Risk items at a particular risk level are considered equally risky
and are subject to the same intensity of testing. Defect predic-
tion applies regression models predicting the number of de-
fects, classifiers that categorize risk items in bins, etc. The type
and granularity of the prediction results have to be compatible
with the definition of the risk levels as the values expressing
the likely defectiveness of the risk items have to be mapped to
the levels in the risk matrix. Defining and adjusting risk levels
to appropriately cluster risk items is often a manual step that
should be supported by fine-grained yet aggregable probability
information coming from predictions.
Prediction results must be useful for humans. In the end,
there are human decision makers who need to understand and
interpret the (automatically produced) predictions to (manual-
ly) make timely and sound decisions in context of software
testing. The decision makers are usually experts in their field
such as test managers or quality engineers with detailed
knowledge about the system under test and profound experi-
ence acquired over many years. Defect prediction is not in-
tended to replace their expert knowledge. The prediction re-
sults should be a useful complement by providing easily acces-
sible, up-to-date, quantitative evidence and new insights at an
acceptable cost-benefit ratio.
V. CONCLUSIONS AND FUTURE WORK
Making accurate predictions about the defectiveness of the
parts in a future version of a system is a complex endeavor.
Making the results applicable for human testers requires addi-
tional steps and their integration in a test process amenable for
incorporating defectiveness information. Risk-based testing
provides such a process and can serve as a framework for lev-
eraging prediction results into successful test results.
In this paper we showed how defect prediction and risk-
based testing fit together, and we discussed requirements for
the interface of the two processes. These requirements have
been derived from our experience with both, defect prediction
and risk-based testing, gained over several years and from ap-
plications in industry projects. These requirements represent
the starting point for future work. We plan to investigate the
different requirements in a literature study on defect prediction
issues and by studying real-world applications where defect
prediction provides input for risk-based testing.
REFERENCES
[1] M. Felderer, C. Haisjackl, R. Breu, and J. Motz, “Integrating manual and
automatic risk assessment for risk-based testing,” Software Quality Days
(SWQD), LNBIP 94, Springer, 2012.
[2] M. Felderer, and R. Ramler, “A multiple case study on risk-based testing
in industry,” International Journal on Software Tools for Technology
Transfer (STTT), 16(5), October 2014, pp. 609-625.
[3] M. Felderer, and R. Ramler, “Risk orientation in software testing
processes of small and medium enterprises: an exploratory and
comparative study,” Software Quality Journal, DOI:10.1007/s11219-
015-9289-z, 2015.
[4] Felderer, M., Schieferdecker, I.: A taxonomy of risk-based testing.
International Journal on Software Tools for Technology Transfer, 16(5),
pp. 559-568, 2014.
[5] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A
systematic literature review on fault prediction performance in software
engineering,” IEEE Transactions on Software Engineering, 38(6), pp.
1276-1304, 2012.
[6] ISTQB, “Standard glossary of terms used in software testing. Version
2.1,” International Software Testing Qualifications Board, 2010.
[7] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking
Classification Models for Software Defect Prediction: A Proposed
Framework and Novel Findings,” IEEE Transactions on Software
Engineering, 34(4), pp. 485-496, July-Aug. 2008.
[8] P. L. Li, J. Herbsleb, M. Shaw, and B. Robinson, “Experiences and
results from initiating field defect prediction and product test
prioritization efforts at ABB Inc.,” 28th Int. Conf. on Software
Engineering (ICSE), 2006.
[9] L. Madeyski, and M. Jureczko, “Which process metrics can significantly
improve defect prediction models? An empirical study,” Software
Quality Journal 23(3), pp. 393-422, 2015.
[10] T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener,
“Defect prediction from static code features: current results,
limitations,” Automated Software Eng., 17(4), pp. 375-407, 2010.
[11] T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “Where the bugs are,”
ACM SIGSOFT Int. Symposium on Software Testing and Analysis
(ISSTA), 2004.
[12] R, Ramler, “The impact of product development on the lifecycle of
defects,” Workshop on Defects in Large Software Systems, (co-located
with ISSTA), 2008.
[13] R. Ramler, and M. Felderer, “Experiences from an initial study on risk
probability estimation based on expert opinion,” Joint Conf. of 23rd Int.
Workshop on Software Measurement and 8th Int. Conf. on Software
Process and Product Measurement (IWSM-MENSURA), 2013.
[14] R. Ramler, and M. Felderer, “A process for risk-based test strategy
development and its industrial evaluation,” Product-Focused Software
Process Improvement (PROFES), LNCS 9459, Springer, 2015.
[15] R. Ramler, K. Wolfmaier, E. Stauder, F. Kossak, and T. Natschläger,
“Key questions in building defect prediction models in practice,”
Product-Focused Software Process Improvement (PROFES). LNBIP 32,
Springer, 2009.
[16] F. Redmill, “Theory and practice of risk-based testing: Research
Articles,” Software Testing, Verification and Reliability, 15(1), pp. 3-
20, Wiley, 2005.
[17] T. Taipale, M. Qvist, and B. Turhan. “Constructing defect predictors and
communicating the outcomes to practitioners,” ACM/IEEE Int.
Symposium on Empirical Software Engineering and Measurement
(ESEM), 2013.
[18] T. Zimmermann, N. Nagappan, L. Williams “Searching for a needle in a
haystack: Predicting security vulnerabilities for Windows Vista,” 3rd
Int. Conf. Software Testing, Verification and Validation (ICST), 2010.