Conference PaperPDF Available

Fault detection effectiveness of source test case generation strategies for metamorphic testing

Authors:

Abstract and Figures

Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing on these methods.
Content may be subject to copyright.
Fault Detection Eectiveness of Source Test Case Generation
Strategies for Metamorphic Testing
Prashanta Saha
School of Computing, Montana State University
Bozeman, Montana
p66n633@msu.montana.edu
Upulee Kanewala
School of Computing, Montana State University
Bozeman, Montana
upulee.kanewala@montana.edu
ABSTRACT
Metamorphic testing is a well known approach to tackle the oracle
problem in software testing. This technique requires the use of
source test cases that serve as seeds for the generation of follow-up
test cases. Systematic design of test cases is crucial for the test
quality. Thus, source test case generation strategy can make a big
impact on the fault detection eectiveness of metamorphic testing.
Most of the previous studies on metamorphic testing have used
either random test data or existing test cases as source test cases.
There has been limited research done on systematic source test
case generation for metamorphic testing. This paper provides a
comprehensive evaluation on the impact of source test case gener-
ation techniques on the fault nding eectiveness of metamorphic
testing. We evaluated the eectiveness of line coverage, branch
coverage, weak mutation and random test generation strategies
for source test case generation. The experiments are conducted
with 77 methods from 4 open source code repositories. Our results
show that by systematically creating source test cases, we can sig-
nicantly increase the fault nding eectiveness of metamorphic
testing. Further, in this paper we introduce a simple metamorphic
testing tool called "METtester" that we use to conduct metamorphic
testing on these methods.
KEYWORDS
Metamorphic testing, Random testing, Source test case generation,
Weak mutation, Branch coverage, Line coverage
ACM Reference Format:
Prashanta Saha and Upulee Kanewala. 2018. Fault Detection Eectiveness
of Source Test Case Generation Strategies for Metamorphic Testing. In
MET’18: MET’18:IEEE/ACM International Workshop on Metamorphic Testing
, May 27, 2018, Gothenburg, Sweden. ACM, Gothenburg, Sweden, 8 pages.
https://doi.org/10.1145/3193977.3193982
1 INTRODUCTION
Atest oracle [
21
] is a mechanism to detect the correctness of the
outcomes of a program. The oracle problem [
3
] can occur when
Corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MET’18, May 27, 2018, Gothenburg, Sweden
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5729-6/18/05.. .$15.00
https://doi.org/10.1145/3193977.3193982
there is no oracle present for the program or it is practically infea-
sible to develop an oracle to verify the correctness of the computed
outputs. This test oracle problem is quite frequent especially with
scientic software and is one of the most challenging problems
in software testing. Metamorphic testing (MT) technique was pro-
posed to alleviate this oracle problem [
7
]. MT uses properties from
the program under test to dene metamorphic relations (MRs). A
MR species how the outputs should change according to a specic
change made into the source input. Thus, from existing test cases
(named as source test cases) MRs are used to generate new test
cases (named as follow-up test cases). Then the set of source and
follow-up test cases are executed on the program under test and
the outputs are checked according to the corresponding MRs. The
program under test can be considered as faulty if a MR is violated.
Eectiveness of MT in detecting faults depends on the quality
of MRs. Additionally the eectiveness of MT should also rely on
the source test cases. Eectiveness of metamorphic testing can be
improved by systematically generating the source test cases. Such
a systematic approach can reduce the size of the test suite and
could be more cost eective. Most of the previous studies in MT
have used randomly generated test cases as source test data for
metamorphic testing. In this study we investigated the eectiveness
of line, branch coverage, weak mutation, and random testing for
creating source test cases for MT.
Our experimental results show that test cases satisfying weak
mutation coverage provide the best fault nding eectiveness. We
also have found that combining one or more systematic source
test case generation technique(s) may increase the fault detection
ability of MT.
2 BACKGROUND
MT is a property based testing approach which aims to alleviate the
oracle problem. But the eectiveness of MT not only depends on
the quality of MRs but also on the source test cases. In this section
we briey discussed MT and source test generation techniques, line,
branch coverage and weak mutation.
2.1 Metamorphic Testing
Source test cases are used in MT [
7
] to generate follow-up test cases
using a set of MRs identied for the program under test (PUT). MRs
[
9
] are identied based on the properties of the problem domain
like the attribute of the algorithm used. We can create source test
cases using techniques like random testing, structural testing or
search based testing. Follow-up test cases are generated by applying
the input transformation specied by the MRs. After executing the
source and follow-up test cases on the PUT we can check if there
is a change in the output that matches the MR, if not the MR is
2
2018 ACM/IEEE International Workshop on Metamorphic Testing
MET’18, May 27, 2018, Gothenburg, Sweden
considered as violated. Violation of MR during testing indicates
fault in the PUT. Since MT checks the relationship between inputs
and outputs of a test program, we can use this technique when the
expected result of a test program is not known.
For example, in gure 1, a Java method add_values is used to
show how source and follow-up test cases work with a PUT. The
add_values method sum up all the array element passed as argument.
Source test case,
t={
3
,
43
,
1
,
54
}
is randomly generated and tested
on add_values. The output for this test case is 101. For this program,
when a constant
c
is added to the input, the output should increase.
This will be used as a MR to conduct MT on this PUT. A constant
value 2 is added to this array to create a follow-up test case
t
=
{
5
,
45
,
3
,
56
}
and then run on the PUT. The output for this follow-up
test case is 109. To satisfy this Addition MR the follow-up test output
should be greater than the source output. In this MT example, the
considered MR is satised for this given source and follow-up test
cases.
2.2 Source Test Case Generation
To generate source test cases we have used the EvoSuite [
11
] tool.
EvoSuite is a test generation tool that automatically produces test
cases targeting a higher code coverage. EvoSuite uses an evolution-
ary search approach that evolves whole test suites with respect
to an entire coverage criterion at the same time. In this paper we
generated source test cases based on line, branch coverage , weak
mutation and random testing. Below we briey describe the sys-
tematic approaches used by EvoSuite to generate them.
2.2.1 Line Coverage. In line coverage [
18
], to cover each line of
source code, we need to make sure that each basic code block in a
method is reached. In traditional search-based testing, this reacha-
bility would be expressed by a combination of branch distance [
16
]
and approach-level. The approach-level measures how distant an
individual execution and the target statement are in terms of the
control dependencies. The branch distance estimates how distant a
predicate (a decision making point) is from evaluation to a desired
target result. For example, given a predicate x==6 and an execution
with value x = 4, the branch distance to the predicate valuing true
would be |4-6|=2, whereas execution with value x=5 is closer to
being true with a branch distance of |5-6|=1. Branch distance can
be measured by applying a set of standard rules [14, 16].
In addition to test case generation, if reformation is a test suite to
execute all statements then the approach level is not important, as
all statements will be executed by the similar test suite. Hence, we
only need to inspect the branch distances of all the branches that
are related to the control dependencies of any of the statements
in that class. There is a control dependency for some statements
for each conditional statement in the code. It is required that the
branch of the statement leading to the dependent code is executed.
Hence, by executing all the tests in a test suite the line coverage
tness value can be calculated. The minimum branch distances
dmin (b,Suite)
are calculated for each executed statement among
all observed executions to every branch bin the collection of control
dependent branches
BCD
. Thus, the line coverage tness function
is dened as [18]:
fLC (Suite)=v(|NCLs|−|CoveredLines|)+
bBCD
v(dmin(b,Suite))
Where NCLs are the set of all statements in the class under
test (CUT), CoveredLines are the total set of covered statements
which are executed by each test case in the test suite, and v(x) is a
normalizing function in [0,1] (e.g. v(x) =x
(x+1)) [2].
2.2.2 Branch Coverage. The idea of covering branches is well
accepted in practice and implemented in popular tools, even though
the practical rationale of branch coverage may not always match the
more theoretical interpretation of covering all edges of a program’s
control ow. Branch coverage is often dened as maximizing the
number of branches of conditional statements that are executed by
a test suite. Thus, a unit test suite is considered as satised if and
only if its at least one test case satises the branch predicate to true
and at least one test case satises the branch predicate to false.
The tness value for the branch coverage is calculated based on a
criteria which is how close a test suite is to covering all branches of
the CUT. The tness value of a test suite is calculated by executing
all of its test cases, keeping trail of the branch distances
d(b,Suite)
for each of the branch in the CUT. Then [18]:
fBC (Suite)=
bB
v(d(b,Suite ))
To optimize the branch coverage the following distance is cal-
culated, where
dmin (b,Suite)
is the minimal branch distance of
branch b on all executions for the test suite [18]:
d(b,Suite)=
0 if the branch has been covered,
v(dmin(b,Suite)) if the predicate has been
executed at least twice,
1 otherwise,
Here it is needed to cover the true and false evaluation of a
predicate, so that a predicate must be executed at least twice by a
test suite. If the predicate is executed only once, then in theory the
searching could oscillate between true and false.
2.2.3 Weak Mutation. Test case generation tools prefer to gen-
erate values that satisfy the constraints or conditions, rather than
developers preferred values like boundary cases. In weak mutation
a small code modication is applied to the CUT and then force the
test generation tool to generate such values that can distinguish
between the original and the mutant. If the execution of a test case
on the mutant leads to a dierent state than the execution on the
CUT than a mutant is considered to be "killed" in the weak muta-
tion. A test suite satises the weak mutation criterion if and only if
at least one test case kill each mutant for the CUT.
Infection distance is measured with respect to a set of mutation
operator which guides to calculate the tness value for the weak
mutation criterion. Here inference of a minimal infection distance
function dmin (μ,Suite)exists and dene [18]:
3
Figure 1: Test Source and follow-up inputs on PUT.
dw(μ,Suite)=
1 if mutant μwas not reached,
v(dmin(μ,Suite)) if mutant μwas reached.
This results in the following tness function for weak mutation
[18]:
fWM(Suite)=
μMc
dw(μ,Suite)
Where Mcis the set of all mutants generated for the CUT.
3 EVALUATION METHOD
We conducted a set of experiments to answer the following research
questions:
RQ1:
Which source test case generation technique(s) is/are
most eective for MT in terms of fault detection?
RQ2:
Can the best performing source test case generation
technique be combined to increase the fault nding eec-
tiveness of MT?
RQ3:
Does the fault detection eectiveness of an individual
MR change with the source test generation method?
RQ4:
How does the source test suite size dier for each
source test generation technique?
3.1 Code Corpus
We built a code corpus containing 77 functions that take numerical
inputs and produce numerical outputs . We obtained these functions
from the following open source projects:
The Colt Project1:
A set of open source libraries written
for high-performance scientic and technical computing in
Java.
Apache Mahout2:
A machine learning library written in
Java.
Apache Commons Mathematics Library3:
A library of
lightweight and self-contained mathematics and statistics
components written in the Java.
We list these functions in Table 2. Functions in the code corpus
perform various calculations using sets of numbers such as cal-
culating statistics (e.g. average, standard deviation and kurtosis),
1http://acs.lbl.gov/software/colt/
2https://mahout.apache.org/
3http://commons.apache.org/proper/commons-math/
Figure 2: METtester Architecture.
calculating distances (e.g. Manhattan and Tanimoto) and search-
ing/sorting. Lines of code of these functions varied between 4 and
52, and the number of input parameters for each function varied
between 1 and 4.
3.2 METtester
METtester [
17
] is a simple tool that we are developing to auto-
mate the MT process on a given Java program. This tool allows
users to specify MRs and source test cases through a simple XML
le. METtester transforms the source test cases according to the
specied MRs and conducts MT on the given program. Figure 2
shows the high level architecture of the tool. Below we describe
the important components of the tool:
XML input le:
User will provide information (Figure 3)
regarding method names to test, source test inputs, MRs, and
the number of test cases to run.
Fault Detection Eectiveness of Source Test Case Generation Strategies for Metamorphic Testing MET’18, May 27, 2018, Gothenburg, Sweden
4
MET’18, May 27, 2018, Gothenburg, Sweden
Figure 3: An example of the XML input given to METtester.
XML le parsing:
Xmlparser class in our tool will parse
information from the .xml le and process those. Then that
information will be sent to the Follow-up test case generation
module.
Follow-up test Case Generation:
In this module follow-
up test cases are generated based on the provided MRs and
the source test cases.
Execute Source & Follow-up test cases on the PUT:
Af-
ter generation of the follow-up test cases METtester will run
both the source and follow-up test cases individually into
the system programs and return outputs from the programs.
Compare Source & Follow-up test results:
After getting
the test results from the test program METtester will com-
pare those results with the MR operators mentioned in the
xml le. If it satises the MR property then the class will ag
the test case as "Pass". If it fails to satisfy the MR property
class will ag it as "Fail" which means there is fault in the
program.
3.3 Experimental Setup
For the 77 methods described in Section 3.1 we generated a total of
7446 mutated versions using the
μ
Java mutation tool [
15
]. We used
the following six metamorphic relations that were used in previous
studies to test these functions [
13
]. Suppose our source test case
is
X={x1,x2,x3, ..., xn}
where
xi
0
,
0
in
. Let source and
follow-up outputs be O(X)and O(Y)respectively:
MR - Addition:
add a positive constant C to the source test
case and the follow-up test case will be
Y={x1+C,x2+
C,x3+C, ..., xn+C}. Then O(Y)≥O(X).
MR - Multiplication:
multiply the source test case by a
positive constant C and the follow-up test case will be
Y=
{x1C,x2C,x3C, ..., xnC}. Then O(Y)≥O(X).
MR - Shule:
randomly permute the elements in the source
test case. The follow-up test case can be
Y={x3,x1,xn, ..., x2
}. Then O(Y)=O(X).
MR - Inclusive:
include a new element
xn+1
0 to the
source test case and the follow-up test case will be
Y=
{x1,x2,x3, ..., xn,xn+1}. Then O(Y)≥O(X).
MR - Exclusive:
exclude an existing element from the source
test case and the follow-up test case will be
Y={x1,x2,x3, ...,
xn1}. Then O(Y)≤O(X).
MR - Invertive:
take the inverse of each element of source
test case. Then the follow-up test case will be
Y={
1
/x1,
1
/x2,
1/x3, ..., 1/xn}. Then O(Y)≤O(X).
For each of the methods, we used EvoSuite [
11
] described in
section 2
.
2 to generate test cases targeting line, branch and weak
mutation coverage. We used the generated test cases as the source
test cases to conduct MT on the methods using the MRs described
using METtester. Further, we randomly generated 10 test cases for
each method to use as source test cases, to be used as the baseline.
4 RESULTS AND DISCUSSION
4.1 Eectiveness of the Source Test Case
Generation Techniques
Figure 4 shows the overall mutant killing rates for the four source
test generation techniques. Among all test case generation tech-
niques, weak mutation performed best by killing 68.7% mutants.
Random tests killed 41.5% of the mutants. Table 1 lists the number
of methods that reported the highest mutant kill rates for each type
of test generation technique. For some methods, several source test
generation techniques gave the same best performance.Therefore,
Figure 5 shows a Venn diagram of all the possible logical relations
between the best performing source test generation techniques for
the set of methods. Weak mutation based test generation technique
reported the highest kill rate in 41 (53%) methods, whereas ran-
dom testing reported the highest kill rate only in 13 (17%) methods.
Therefore these results suggest that weak mutation based source
test case generation is more eective in detecting faults with MT.
Figure 4: Total % of mutants killed by each source test suite
generation technique.
Table 1: Total number of methods having the highest mu-
tants kill rate for each source test generation techniques.
Total Methods Weak mutation Line Branch Random
77 41 26 29 13
RQ1:
Weak mutation based test suites have the highest
fault detection rate for majority of the methods
4.2 Fault Finding Eectiveness of Combined
Source Test Cases
To observe whether combining source test case generation tech-
niques will achieve a higher fault detection rate, we combined the
5
Table 2: All methods with Mutants kill rates and test suite size for each Source test case generation technique
Branch weak mutation Line Random |
Method name
Killrate
(%)
No.
of
Test
Cases
Killrate
(%)
No.
of
Test
Cases
Killrate
(%)
No.
of
Test
Cases
Killrate
(%)
No.
of
Test
Cases
add_values (Add elements in an array) 63.63 1 63.63 1 54.54 1 30 10
array_calc1 33.33 1 33.33 1 46.15 1 52.10 10
array_copy (Deep copy an array) 56.00 1 64.00 1 64.00 1 0.00 10
average ( Average of an array) 38.10 1 73.80 1 42.86 1 28.20 10
bubble (Implements bubble sort) 51.40 1 44.95 3 36.69 1 16.90 10
cnt_zeroes (Count zero in an array) 41.00 1 51.30 2 38.46 1 0.00 10
count_k (Occurrences of k in an array) 31.80 1 36.36 2 34.09 1 50.00 10
count_non_zeroes (Count non zero element in array) 41.00 1 48.71 2 51.28 1 22.20 10
dot_product 63.00 1 60.87 1 56.52 1 22.20 10
elementwise_max (Elementwise maximum) 46.30 2 68.51 3 83.33 2 0.00 10
elementwise_min (Elementwise minimum) 44.40 1 55.56 1 55.56 1 0.00 10
nd_euc_dist (Euclidean distance between two vectors) 80.10 1 76.39 1 79.17 1 50 10
nd_magnitude (Magnitude of a vector) 52.10 1 75.00 1 52.10 1 8.69 10
nd_max (nd the maximum value) 70.80 1 50.00 1 50.00 1 70.90 10
nd_max2 64.10 1 71.84 2 67.96 1 98.40 10
nd_median (Find median value in an array) 48.70 2 98.93 3 41.71 2 53.10 10
nd_min (Find minimum value in an array) 40.40 1 61.70 1 57.45 1 83.80 10
geometric_mean (Returns the geometric mean of the entries in the input array) 51.20 1 53.66 1 95.12 1 65.40 10
hamming_dist (Hamming distance between two vectors) 40.90 1 84.09 3 59.09 2 15.90 10
insertion_sort (Implements insertion sort) 43.60 1 42.55 2 37.23 1 32.65 10
manhattan_dist (Manhattan distance between two vectors) 53.30 1 61.36 2 53.30 1 0.00 10
mean_absolute_error (Measure of dierence between two continuous variables) 37.50 1 41.07 2 39.29 1 0.00 10
selection_sort (Implements selection sort) 41.30 1 41.30 2 39.40 1 21.60 10
sequential_search (Finding a target value within a list) 37.20 2 25.58 3 30.23 2 37.50 10
set_min_val (Set array elements less than k equal to k) 51.20 2 58.14 2 30.23 1 100 10
shell_sort (Implements shell sort) 43.70 1 42.51 1 43.11 1 0.00 10
variance (Returns the variance from a standard deviation) 26.10 1 39.86 1 30.40 1 25.70 10
weighted_average (A mean calculated by giving values in a data set) 86.10 1 56.94 1 86.10 1 21.20 10
manhattanDistance (The distance between two points in a grid) 48.89 1 77.78 2 22.22 1 9.10 10
chebyshevDistance (Distance metric dened on a vector space) 39.08 2 43.68 5 35.63 2 2.00 10
tanimotoDistance (a proper distance metric) 30.21 2 32.97 5 44.50 2 5.60 10
errorRate 61.04 3 58.44 2 58.44 2 0.00 10
sum 50.00 1 77.78 1 50.00 1 35.30 10
distance1 (Compute the distance between the instance and another vector) 53.33 1 80.00 1 53.33 1 14.8 10
distanceInf (Compute the distance between the instance and another vector) 46.67 1 46.67 1 46.67 1 14.8 10
ebeadd (Creates an array whose contents will be the element-by-element addition of the arguments)
92.68 2 100.00 3 100.00 2 15.8 10
ebedivide (Creates an array whose contents will be the element-by-element division) 100.00 2 100.00 5 100.00 2 26.8 10
ebemultiply (Creates an array whose contents will be the element-by-element multiplication) 100.00 2 100.00 3 92.68 2 15 10
safeNorm (Returns the Cartesian norm ) 14.78 1 98.63 5 97.08 4 0.8 10
scale(Create a copy of an array scaled by a value) 48.72 1 58.97 3 53.85 1 47.8 10
entropy 88.42 1 88.42 2 88.42 1 42.9 10
g 93.55 2 95.16 2 93.55 1 20.9 10
calculateAbsoluteDierences 60.98 1 60.98 1 60.98 1 0 10
evaluateHoners 46.03 1 79.37 1 47.62 1 80.4 10
evaluateInternal 95.25 1 93.47 2 95.55 1 90.6 10
evaluateNewton 80.00 1 65.71 1 64.29 1 76.8 10
meanDierence (Returns the mean of the (signed) dierences) 40.00 1 80.00 1 40.00 1 40 10
equals 22.50 3 27.50 4 21.25 3 100 10
chiSquare (Implements Chi-Square test statistics) 96.41 2 96.41 2 96.41 2 65.6 10
partition 43.26 5 95.81 5 28.84 3 88.1 10
evaluateWeightedProduct 30.61 2 40.82 2 42.86 2 2 10
autoCorrelation (Returns the auto-correlation of a data sequence) 25.20 2 93.50 2 43.09 1 79.40 10
covariance (Returns the covariance of two data sequences) 24.84 1 23.57 1 23.57 1 86.70 10
durbinWatson (Durbin-Watson computation) 0.00 0 33.77 1 0.00 0 14.10 10
harmonicMean (Returns the harmonic mean of a data sequence) 74.00 1 74.00 1 76.00 1 42.50 10
kurtosis (Returns the kurtosis (aka excess) of a data sequence) 93.84 1 93.84 1 97.16 1 34.80 10
lag1 (Returns the lag-1 autocorrelation of a dataset) 99.55 1 32.70 1 89.55 1 33.70 10
max (Returns the largest member of a data sequence) 51.72 1 56.90 1 51.72 1 96.60 10
meanDeviation (Returns the mean deviation of a dataset) 54.39 1 33.33 1 28.07 1 78.30 10
min (Returns the smallest member of a data sequence) 67.41 1 81.03 2 70.69 1 96.60 10
polevl 94.23 2 88.46 1 88.46 2 45.50 10
pooledMean (Returns the pooled mean of two data sequences) 36.43 1 34.88 1 34.88 1 19.30 10
pooledVariance (Returns the pooled variance of two data sequences) 43.08 1 47.83 1 47.83 1 31.10 10
power 53.33 1 53.33 1 53.33 1 15.80 10
product (Returns the product) 50.00 1 50.00 1 50.00 1 94.70 10
quantile (Returns the phi-quantile) 40.13 2 40.76 2 32.48 2 40.00 10
sampleKurtosis ( Returns the sample kurtosis (aka excess) of a data sequence) 93.86 1 93.86 1 92.98 1 85.10 10
sampleSkew (Returns the sample skew of a data sequence) 89.47 1 89.47 1 97.37 1 89.50 10
sampleVariance (Returns the sample variance of a data sequence) 75.31 1 75.31 1 12.35 1 71.20 10
skew ( Returns the skew of a data sequence) 93.88 1 93.88 1 93.88 1 48.80 10
square 47.37 1 47.37 1 57.89 1 5.30 10
standardize (Modies a data sequence to be standardized) 89.26 1 89.26 1 91.95 1 77.60 10
sumOfLogarithms ( Returns the sum of logarithms of a data sequence) 75.00 1 68.75 1 68.75 1 21.90 10
sumOfPowerOfDeviations 68.75 1 52.08 1 75.00 1 64.90 10
weightedMean (Returns the weighted mean of a data sequence) 77.46 1 77.46 1 77.46 1 65.00 10
weightedRMS (Returns the weighted RMS (Root-Mean-Square) of a data sequence) 86.96 1 86.96 1 86.96 1 43.30 10
winsorizedMean (Returns the winsorized mean of a sorted data sequence) 33.00 1 37.93 1 34.48 1 0.00 10
Fault Detection Eectiveness of Source Test Case Generation Strategies for Metamorphic Testing MET’18, May 27, 2018, Gothenburg, Sweden
6
MET’18, May 27, 2018, Gothenburg, Sweden
Figure 5: Venn Diagram for all the combinations of source
test suites that performed best for each individual methods.
best performing source test generation technique, weak mutation,
with the other source test generation techniques. Table 3 shows
the total percentage of mutants killed with each combined test
suite. Combination of weak mutation and random test cases has the
greater percentage of mutants kill rate (74.91) than combination
of line (72.87) and branch (74.6) separately with weak mutation. If
we combine all of the three strategies it slightly increases the total
percentage of killed mutants (75.98) but there are few things to be
considered, like combined test suite size.
Table 3: Total % of mutants killed after combining Weak Mu-
tation, Line, Branch Coverage, and Random Testing
Weak
Mutation
+Line(%)
Weak
Mutation
+Branch(%)
Weak
Mutation
+Line+
Branch(%)
Weak Mu-
tation+
Random(%)
72.87 74.6 75.98 74.91
RQ2:
Combining weak mutation test cases with random
test cases will lead to detect more faults
4.3 Fault Finding Eectiveness of Individual
MRs
To see how each source test case generation technique performs
with individual MRs, Figure 6 illustrates the percentage of mutants
killed by all six MRs separately using weak mutation, line, branch
coverage and random test suites. Weak mutation has the highest
percentage of killed mutants in all the six MRs. Specically with
multiplication and invertive MRs, the weak mutation test suite
surpasses others on mutants’ killing rate. But line coverage based
test suites were similar to weak mutation on killing mutants with
addition, shue, inclusive and exclusive MRs. For exclusive MR, all
the test suites performed almost similarly.
RQ3:
Weak mutation killed highest number of mutants
in all the MRs
4.4 Impact of Source Test Suite Size
Table 4 compares the coverage criteria in terms of the total number
of tests generated, their average and median test suite size of the in-
dividual methods. In addition, in columns Smaller, Equal, and Larger
we compare whether the size of the weak mutation test suites are
smaller, equal or larger than those produced by other source test
case generation techniques. And p-value column shows the p-value
computed using the paired t-test between weak mutation - line and
weak mutation -branch. We are not comparing random test suites
here, because we intentionally generated 10 random test cases for
each method. Weak Mutation leads to larger test suites than branch
and line coverage and on average, number of test cases produced
for weak mutation are larger than those produced for branch and
line coverage. The total number of test cases are also relatively
larger for weak mutation compared to line and branch coverage.
RQ4:
Weak Mutation generated a higher number of test
cases
5 THREATS TO VALIDITY
Threats to internal validity may result from the way empirical
study was carried out. EvoSuite and our experimental setup have
been carefully tested, although testing can not denitely prove the
absence of defects.
Threats to construct validity may occur because of the third party
tools we have used. The EvoSuite tool has been used to generate
source test cases for line, branch and weak mutation test generation
techniques. Further, we used the
μ
Java mutation tool to create
mutants for our experiment. To minimize these threats we veried
that the results produced by these tools are correct by manually
inspecting randomly selected outputs produced by each tool.
Threats to external validity were minimized by using the 77
methods was employed as case study, which is collected from 4
dierent open source project classes. This provides high condence
in the possibility to generalize our results to other open source
software. We only used the EvoSuite tool to generate test cases for
our major experiment. But we also used the JCUTE [
20
]toolto
generate branch coverage based test suites for our initial case study
and also observed similar results.
6 RELATED WORK
Most contributions on MT use either random generated test data or
existing test suites for the generation of source test cases. Not much
research has been done on systematic generation of source test
cases for MT. Gotlieb and Botella [
12
] presented an approach called
Automated Metamorphic Testing where they translated the code into
an equivalent constraint logic program and tried to nd test cases
that violates the MRs. Chen et al. [
8
] compared the eectiveness
of random testing and "special values" as source test cases for MT.
Special values are inputs where the output is well known for a
particular method. Wu et al.[
22
] proved that random test cases are
more eective than those test cases that are derived from "special
values". Segura et al. [
19
] also compared the eectiveness of random
7
Fault Detection Eectiveness of Source Test Case Generation Strategies for Metamorphic Testing MET’18, May 27, 2018, Gothenburg, Sweden
Figure 6: % of Mutants killed by all six MRs using 4 test suite strategies (Branch, Line Coverage, Weak Mutation and Random)
Table 4: Average test suites size for Weak mutation, Line coverage, Branch coverage and Random
Test Suites Total Number of Test Cases Average Size Median size Std Dev Smaller Equal Larger p-value
Weak mutation 135 1.75 1 1.13 - - - -
Line 97 1.26 1 0.67 1 45 31 3.102e-07
branch 99 1.29 1 0.59 2 49 26 1.375e-05
Random 770 10 10 0 77 0 0 -
testing with manually generated test suites for MT. Their results
showed that randomly generated test suites are more eective
in detecting faults than manually designed test suites. They also
observed that combining random testing with manual tests provides
better fault detection ability than random testing only.
Batra and Sengupta [
5
] proposed genetic algorithm to generate
test cases maximizing the paths traversed in the program under
test for MT. Chen et al. [
6
] also addressed the same problem from
a dierent prospective. They proposed partitioning the input do-
main of the PUT into multiple equivalence classes for MT. They
proposed an algorithm which will generate test cases which will
cover those equivalence classes. They were able to generate test
cases that provide high fault detection rate. Symbolic Execution
was used to construct MRs and their corresponding source test
cases by Dong and Zhang [
10
]. Program paths were rst analyzed
to generate symbolic inputs and then, these symbolic inputs were
used to construct MRs. In the nal step, source test cases were
generated by replacing the symbolic inputs with real values.
Barus et al. [
4
] applied the Adaptive Random Testing (ART) over
the random testing (RT) to nd the eectiveness of source test case
generation on MT. Their results showed that ART outperforms RT
on enhancing the eectiveness of MT. Alatawi et al. [
1
] used the
automated test input generation technique called dynamic symbolic
execution (DSE) to generate the source test inputs for metamorphic
testing. Their results showed that DSE improves the coverage and
fault detection rate of metamorphic testing compared to random
testing using signicantly smaller test suites. Compared to them,
in this work, we evaluate the eectiveness of four commonly used
coverage criteria for automated source test case generation.
7 CONCLUSIONS & FUTURE WORK
In this study we empirically evaluated the fault nding eectiveness
of four dierent source test case generation strategies for MT: line,
branch, weak mutation and random.
Our results show that weak mutation coverage based test gen-
eration can be an eective source test case generation technique
for MT than the other techniques. Our results also show that the
fault nding eectiveness of MT can be improved by combining
source tests generated for weak mutation coverage with randomly
generated source test cases.
Further, in this paper we introduce a MT tool called "METtester."
We plan to incorporate the investigated automated source test
generation techniques into this tool. We also plan to extend the
current case study to larger code bases and experiment with more
source test generation techniques such as adaptive random test
generation and data ow based test generation. Further, we plan to
analyze the impact of the coverage of follow up test cases in our
future research.
8
MET’18, May 27, 2018, Gothenburg, Sweden
ACKNOWLEDGMENTS
This work is supported by award number 1656877 from the Na-
tional Science Foundation. Any Opinions, ndings and conclusions
or recommendations expressed in this material are those of the au-
thor(s) and do not necessarily reect those of the National Science
Foundation.
REFERENCES
[1]
E. Alatawi, T. Miller, and H. SÃÿndergaard. 2016. Generating Source Inputs for
Metamorphic Testing Using Dynamic Symbolic Execution. In 2016 IEEE/ACM 1st
International Workshop on Metamorphic Testing (MET). 19–25. https://doi.org/10.
1109/MET.2016.012
[2]
A. Arcuri. 2010. It Does Matter How You Normalise the Branch Distance in
Search Based Software Testing. In 2010 Third International Conference on Software
Testing, Verication and Validation. 205–214. https://doi.org/10.1109/ICST.2010.17
[3]
E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. 2015. The Oracle
Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering
41, 5 (May 2015), 507–525. https://doi.org/10.1109/TSE.2014.2372785
[4]
A. C. Barus, T. Y. Chen, F. C. Kuo, H. Liu, and H. W. Schmidt. 2016. The Impact
of Source Test Case Selection on the Eectiveness of Metamorphic Testing. In
2016 IEEE/ACM 1st International Workshop on Metamorphic Testing (MET). 5–11.
https://doi.org/10.1109/MET.2016.010
[5]
Gagandeep Batra and Jyotsna Sengupta. 2011. An Ecient Metamorphic Testing
Technique Using Genetic Algorithm. In Information Intelligence, Systems, Technol-
ogy and Management, Sumeet Dua, Sartaj Sahni, and D. P. Goyal (Eds.). Springer
Berlin Heidelberg, Berlin, Heidelberg, 180–188.
[6]
Leilei Chen, Lizhi Cai, Jiang Liu, Zhenyu Liu, Shiyan Wei, and Pan Liu. 2012.
An optimized method for generating cases of metamorphic testing. In 2012 6th
International Conference on New Trends in Information Science, Service Science and
Data Mining (ISSDM2012). 439–443.
[7]
Tsong Yueh Chen. 2015. Metamorphic Testing: A Simple Method for Alleviating
the Test Oracle Problem. In Proceedings of the 10th International Workshop on
Automation of Software Test (AST ’15). IEEE Press, Piscataway, NJ, USA, 53–54.
http://dl.acm.org/citation.cfm?id=2819261.2819278
[8]
Tsong YuehChen, Fei-Ching Kuo, Ying Liu, and Antony Tang. 2004. Metamorphic
Testing and Testing with Special Values. In 4th IEEE International Workshop
on Source Code Analysis and Manipulation (SCAM 2004), 15-16 September 2004,
Chicago, IL, USA. 128–134.
[9]
T. Y. Chen, F. C. Kuo, D. Towey, and Z. Q. Zhou. 2012. Metamorphic Testing:
Applications and Integration with Other Methods: Tutorial Synopsis. In 2012 12th
International Conference on Quality Software. 285–288. https://doi.org/10.1109/
QSIC.2012.21
[10]
Guowei Dong, Tao Guo,and Puhan Zhang. 2013. Security assurance with program
path analysis and metamorphic testing. In 2013 IEEE 4th International Conference
on Software Engineering and Service Science. 193–197. https://doi.org/10.1109/
ICSESS.2013.6615286
[11]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Gen-
eration for Object-oriented Software. In Proceedings of the 19th ACM SIGSOFT
Symposium and the 13th European Conference on Foundations of Software Engi-
neering (ESEC/FSE ’11). ACM, New York, NY, USA, 416–419. https://doi.org/10.
1145/2025113.2025179
[12]
A. Gotlieb and B. Botella. 2003. Automated metamorphic testing. In Proceed-
ings 27th Annual International Computer Software and Applications Conference.
COMPAC 2003. 34–40. https://doi.org/10.1109/CMPSAC.2003.1245319
[13]
Upulee Kanewala, James M. Bieman, and Asa Ben-Hur. 2016. Predicting meta-
morphic relations for testing scientic software: a machine learning approach
using graph kernels. Software Testing, Verication and Reliability 26, 3 (2016),
245–269. https://doi.org/10.1002/stvr.1594 stvr.1594.
[14]
B. Korel. 1990. Automated Software Test Data Generation. IEEE Trans. Softw.
Eng. 16, 8 (Aug. 1990), 870–879. https://doi.org/10.1109/32.57624
[15]
Yu-Seung Ma, Je Outt, and Yong Rae Kwon. 2005. MuJava: An Automated
Class Mutation System: Research Articles. Softw. Test. Verif. Reliab. 15, 2 (June
2005), 97–133. https://doi.org/10.1002/stvr.v15:2
[16]
Phil McMinn. 2004. Search-based Software Test Data Generation: A Survey:
Research Articles. Softw. Test. Verif. Reliab. 14, 2 (June 2004), 105–156. https:
//doi.org/10.1002/stvr.v14:2
[17]
Prashanta Saha (ps073006) and Upulee Kanewala. 2018. MSU-STLab/METtester
1.0.0. (Jan. 2018). https://doi.org/10.5281/zenodo.1157183
[18]
José Miguel Rojas, José Campos, Mattia Vivanti, Gordon Fraser, and Andrea
Arcuri. 2015. Combining Multiple Coverage Criteria in Search-Based Unit Test
Generation. Springer International Publishing, Cham, 93–108. https://doi.org/10.
1007/978-3- 319-22183- 0_7
[19]
Sergio Segura, Robert M. Hierons, David Benavides, and Antonio Ruiz-CortÃľs.
2011. Automated metamorphic testing on the analyses of feature models. Infor-
mation and Software Technology 53, 3 (2011), 245 258. https://doi.org/10.1016/j.
infsof.2010.11.002
[20]
Koushik Sen and Gul Agha. 2006. CUTE and jCUTE: Concolic Unit Testing and
Explicit Path Model-Checking Tools. In CAV, Thomas Ball and Robert B. Jones
(Eds.). 419–423.
[21] Elaine Weyuker. 1982. On Testing Non-Testable Programs. 25 (11 1982).
[22]
Peng Wu, SHI Xiao-Chun, TANG Jiang-Jun, and LIN Hui-Min. 2005. Metamorphic
Testing and Special Case Testing: A Case Study. 16 (07 2005).
9
... Sometimes, the correctness of the program outcomes is not easy to specifythe test oracle problem. This problem was studied in [15] for scientific software and a metamorphic testing was proposed, it specifies output changes related to input changes. ...
... The heat map for alert types revealed bigger number of some critical types: unavailable sender (187 and 146 for log11 and 12, respectively), SenderCriticalError (533-2045 for logs [13][14][15], AccountJobinactive (555-1405 for logs 11 and 10), transactioBad Message (218-251 for logs 1-3), securityviolation (200-604 for logs 18,11,12), transactionResponseTimeout (100-604 for logs 13). Five logs did not comprise alerts, for 4 logs alerts constituted a fraction of percent, 3 logs with about 2-5%, the remaining ones below 2%. ...
... The heatmaps for registered exceptions (25 types) showed for most logs low values (0-10) per exception, and 10-50 exceptions per log file in total. However, two exceptions FileProcessingFileServer criticalerror and SourceDirectorynotFoundError showed very big values (533-2045 for logs [13][14][15]. They correlated with Source Scheduler (552-2047) and alert type SchedulerErrorCritical (533-2045) for the same logs. ...
Article
Full-text available
Software reliability depends on the performed tests. Bug detection and diagnosis are based on test outcome (oracle) analysis. Most practical test reports do not provide sufficient information for localizing and correcting bugs. We have found the need to extend the space of test result observation in data and time perspectives. This resulted in tracing supplementary test result features in event logs. They are explored with combined text mining and log parsing techniques. Another important point is correlating test life cycle with project development history journaled in issue tracking and software version control repositories. Dealing with the outlined problems, neglected in the literature, we have introduced original analysis schemes. They focus on assessing test coverage, reasons of low diagnosability, and test result profiles. Multidimensional investigation of test features and their management is supported with the developed test infrastructure. This assures a holistic insight into the test efficiency to identify test scheme deficiencies (e.g., functional inadequacy, aging, insufficient coverage) and possible improvements (test set updates). Our studies have been verified in relevance to a real commercial project (industrial case study) and confronted with the experience of testers engaged in other projects.
... Among many solutions to test the oracle problem, metamorphic testing (MT) is the most popular technique that tackles the oracle problem in software testing of IPAs [5]. MT was first proposed by Chen et al. in 1998 [6]. ...
... The foremost step is the generation of source test cases (original test cases). In literature, source test cases are generated either through some traditional test case generation techniques as discussed earlier or through some tool such as EvoSuite (it generates source test cases automatically through coverage criterion) [5]. Nowadays, few researches are emerged in the direction of generation and selection of source test cases that are effective in fault detection [24]. ...
Article
Full-text available
Testing an intricate plexus of advanced software system architecture is quite challenging due to the absence of test oracle. Metamorphic testing is a popular technique to alleviate the test oracle problem. The effectiveness of metamorphic testing is dependent on metamorphic relations (MRs). MRs represent the essential properties of the system under test and are evaluated by their fault detection rates. The existing techniques for the evaluation of MRs are not comprehensive, as very few mutation operators are used to generate very few mutants. In this research, we have proposed six new MRs for dilation and erosion operations. The fault detection rate of six newly proposed MRs is determined using mutation testing. We have used eight applicable mutation operators and determined their effectiveness. By using these applicable operators, we have ensured that all the possible numbers of mutants are generated, which shows that all the faults in the system under test are fully identified. Results of the evaluation of four MRs for edge detection show an improvement in all the respective MRs, especially in MR1 and MR4, with a fault detection rate of 76.54% and 69.13%, respectively, which is 32% and 24% higher than the existing technique. The fault detection rate of MR2 and MR3 is also improved by 1%. Similarly, results of dilation and erosion show that out of 8 MRs, the fault detection rates of four MRs are higher than the existing technique. In the proposed technique, MR1 is improved by 39%, MR4 is improved by 0.5%, MR6 is improved by 17%, and MR8 is improved by 29%. We have also compared the results of our proposed MRs with the existing MRs of dilation and erosion operations. Results show that the proposed MRs complement the existing MRs effectively as the new MRs can find those faults that are not identified by the existing MRs.
... In fact, most of the previous studies in MT have used randomly generated test cases or existing test cases as source test cases when conducting MT [5]- [8], [12]. Our previous work showed that the effectiveness of MT can be improved by systematically generating the source test cases based on some coverage criteria such as line, branch, and weak mutation (WM) [9]. But, it is sub-optimal to use a combination of all the coverage-based techniques to test numerical programs. ...
... For example, there exist some unique characteristics of weather systems that help testers to find the correct MRs. We can create source test cases using techniques like random testing [12], structural testing [13], or search-based testing [9]. Follow-up test cases are generated by applying the input transformation specified by the MRs. ...
Conference Paper
Full-text available
Metamorphic testing is a technique that uses metamorphic relations (i.e., necessary properties of the software under test), to construct new test cases (i.e., follow-up test cases), from existing test cases (i.e., source test cases). Metamorphic testing allows for the verification of testing results without the need of test oracles (a mechanism to detect the correctness of the outcomes of a program), and it has been widely used in many application domains to detect real-world faults. Numerous investigations have been conducted to further improve the effectiveness of metamorphic testing. Recent studies have emerged suggesting a new research direction on the generation and selection of source test cases that are effective in fault detection. Herein, we present two important findings: i) a mutant reduction strategy that is applied to increase the testing efficiency of source test cases, and ii) a test suite minimization technique to help reduce the testing costs without trading off fault-finding effectiveness. To validate our results, an empirical study was conducted to demonstrate the increase in efficiency and fault-finding effectiveness of source test cases. The results from the experiment provide evidence to support our claims.
... Previous studies have used special case [9] and random testing [10] techniques to generate source test cases. Further, previous studies have shown that using coverage-based test inputs as source inputs would improve the fault detection effectiveness of MT compared to random test inputs [11]. As shown in the above To satisfy this MR the follow-up test output should be greater than the source output. ...
... Finally, source test cases were generated by replacing the symbolic inputs with real values. Saha et al. [11] applied a coveragebased testing technique to generate test cases for MT. They compared their results with randomly generated test cases, and it outperforms the effectiveness of randomly generated test suite. ...
... [cs.SE] 30 Dec 2024 amined the characteristic of effective MRs and proposed some qualitative guidelines for selecting effective MRs [19], [20]. Apart from the identification of MRs, a variety of source test input generation approaches have been proposed [21], [22], and studies have investigated their impact on the fault detection effectiveness of MT [23], [24]. ...
Preprint
Metamorphic testing (MT) is a simple yet effective technique to alleviate the oracle problem in software testing. The underlying idea of MT is to test a software system by checking whether metamorphic relations (MRs) hold among multiple test inputs (including source and follow-up inputs) and the actual output of their executions. Since MRs and source inputs are two essential components of MT, considerable efforts have been made to examine the systematic identification of MRs and the effective generation of source inputs, which has greatly enriched the fundamental theory of MT since its invention. However, few studies have investigated the test adequacy assessment issue of MT, which hinders the objective measurement of MT's test quality as well as the effective construction of test suites. Although in the context of traditional software testing, there exist a number of test adequacy criteria that specify testing requirements to constitute an adequate test from various perspectives, they are not in line with MT's focus which is to test the software under testing (SUT) from the perspective of necessary properties. In this paper, we proposed a new set of criteria that specifies testing requirements from the perspective of necessary properties satisfied by the SUT, and designed a test adequacy measurement that evaluates the degree of adequacy based on both MRs and source inputs. The experimental results have shown that the proposed measurement can effectively indicate the fault detection effectiveness of test suites, i.e., test suites with increased test adequacy usually exhibit higher effectiveness in fault detection. Our work made an attempt to assess the test adequacy of MT from a new perspective, and our criteria and measurement provide a new approach to evaluate the test quality of MT and provide guidelines for constructing effective test suites of MT.
... (2) Metamorphic relations should not be formally described. (Tao et al., 2010;Hui and Huang, 2013;Bandaru and Albert Mayan, 2016;Saha and Kanewala, 2018;Lv et al., 2018;Segura et al., 2020;Asyrofi et al., 2021) Test Case Generation Techniques ...
Article
Software Industry is evolving at a very fast pace since last two decades. Many software developments, testing and test case generation approaches have evolved in last two decades to deliver quality products and services. Testing plays a vital role to ensure the quality and reliability of software products. In this paper authors attempted to conduct a systematic study of testing tools and techniques. Six most popular e-resources called IEEE, Springer, Association for Computing Machinery (ACM), Elsevier, Wiley and Google Scholar to download 738 manuscripts out of which 125 were selected to conduct the study. Out of 125 manuscripts selected, a good number approx. 79% are from reputed journals and around 21% are from good conference of repute. Testing tools discussed in this paper have broadly been divided into five different categories: open source, academic and research, commercial, academic and open source, and commercial & open source. The paper also discusses several benchmarked datasets viz. Evosuite 10, SF100 Corpus, Defects4J repository, Neo4j, JSON, Mocha JS, and Node JS to name a few. Aim of this paper is to make the researchers aware of the various test case generation tools and techniques introduced in the last 11 years with their salient features.
Article
Metamorphic testing (MT) is effective in detecting software failures; it detects failures by examining the metamorphic relations (MRs) among source test cases (STCs), follow‐up test cases (FTCs) and their respective outputs. The STCs together with the corresponding FTCs, considered as a whole, are called metamorphic groups (MGs). MT performance relies heavily on the MRs and MGs. Previous studies have mainly focused on improving MT performance by identifying effective MRs, or through generation of MGs with high quality, but have somewhat neglected the selection of MRs and MGs from existing ones. In this paper, we address this issue by introducing a new metric for guiding the selection of effective MR‐MG pairs from a new perspective: The MR‐MG pair is chosen such that the MR makes the current MG as far away as possible from the executed MGs. We design an MR‐MG pair selection algorithm, named metamorphic relation and group selection based on adaptive random testing (MRGS‐ART), to implement our metric. The intuition behind MRGS‐ART is that we attempt to improve MT performance by achieving an even distribution of STCs and FTCs in their corresponding input domains for all the MRs used. Experimental results indicate that MRGS‐ART can enhance MT performance. We believe that this is the first comprehensive and systematic demonstration, from the perspective of both MRs and MGs, that making STCs and FTCs evenly distributed in their corresponding input domains can improve MT performance. Finally, by analysing the experimental results, we provide guidance on how to most effectively implement MRGS‐ART.
Article
Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., vulnerability detection, remain difficult to apply in practice. Specifically, though previous work has demonstrated the potential of metamorphic testing-security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs-metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black-box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines involving standard search approaches. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving the same level of vulnerability detection. Furthermore, AIM significantly outperformed all the considered baselines regarding vulnerability coverage.</p
Article
Full-text available
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice.
Conference Paper
Full-text available
In software testing, an oracle refers to a mechanism against which testers can decide whether or not outcomes of test case executions are correct. The oracle problem refers to situations when either an oracle is not available, or it is too expensive to apply. Metamorphic testing has emerged as an effective and efficient approach to alleviating the oracle problem. This article introduces the basic concepts and procedures of metamorphic testing, and gives examples to show its applications, and integration with other methods.
Conference Paper
Metamorphic testing uses domain-specific properties about a program's intended behaviour to alleviate the oracle problem. From a given set of source test inputs, a set of follow-up test inputs are generated which have some relation to the source inputs, and their outputs are compared to outputs from the source tests, using metamorphic relations. We evaluate the use of an automated test input generation technique called dynamic symbolic execution (DSE) to generate the source test inputs for metamorphic testing. We investigate whether DSE increases source-code coverage and fault finding effectiveness of metamorphic testing compared to the use of random testing, and whether the use of metamorphic relations as a supportive technique improves the test inputs generated by DSE. Our results show that DSE improves the coverage and fault detection rate of metamorphic testing compared to random testing using significantly smaller test suites, and the use of metamorphic relations increases code coverage of both DSE and random tests considerably, but the improvement in the fault detection rate may be marginal and depends on the used metamorphic relations.
Conference Paper
Metamorphic Testing (MT) aims to alleviate the oracle problem. In MT, testers define metamorphic relations (MRs) which are used to generate new test cases (referred to as follow-up test cases) from the available test cases (referred to as source test cases). Both source and follow-up test cases are executed and their outputs are verified against the relevant MRs, of which any violation implies that the software under test is faulty. So far, the research on the effectiveness of MT has been focused on the selection of better MRs (that is, MRs that are more likely to be violated). In addition to MR selection, the source and follow-up test cases may also affect the effectiveness of MT. Since follow-up test cases are defined by the source test cases and MRs, selection of source test cases will then affect the effectiveness of MT. However, in existing MT studies, random testing is commonly adopted as the test case selection strategy for source test cases. This study aims to investigate the impact of source test cases on the effectiveness of MT. Since Adaptive Random Testing (ART) has been developed as an enhancement to Random Testing (RT), this study will focus on comparing the performance of RT and ART as source test case selection strategies on the effectiveness of MT. Experiment results show that ART outperforms RT on enhancing the effectiveness of MT.
Conference Paper
Automated test generation techniques typically aim at maximising coverage of well-established structural criteria such as statement or branch coverage. In practice, generating tests only for one specific criterion may not be sufficient when testing object oriented classes, as standard structural coverage criteria do not fully capture the properties developers may desire of their unit test suites. For example, covering a large number of statements could be easily achieved by just calling the main method of a class; yet, a good unit test suite would consist of smaller unit tests invoking individual methods, and checking return values and states with test assertions. There are several different properties that test suites should exhibit, and a search-based test generator could easily be extended with additional fitness functions to capture these properties. However, does search-based testing scale to combinations of multiple criteria, and what is the effect on the size and coverage of the resulting test suites? To answer these questions, we extended the EvoSuite unit test generation tool to support combinations of multiple test criteria, defined and implemented several different criteria, and applied combinations of criteria to a sample of 650 open source Java classes. Our experiments suggest that optimising for several criteria at the same time is feasible without increasing computational costs: When combining nine different criteria, we observed an average decrease of only 0.4 % for the constituent coverage criteria, while the test suites may grow up to 70 %.
Article
Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behaviour of the programme. But the challenges in creating suitable oracles limit the ability to perform automated testing in some programmes, and especially in scientific software. Metamorphic testing is a method for automating the testing process for programmes without test oracles. This technique operates by checking whether the programme behaves according to properties called metamorphic relations. A metamorphic relation describes the change in output when the input is changed in a prescribed way. Unfortunately, finding the metamorphic relations satisfied by a programme or function remains a labour-intensive task, which is generally performed by a domain expert or a programmer. In this work, we propose a machine learning approach for predicting metamorphic relations that uses a graph-based representation of a programme to represent control flow and data dependency information. In earlier work, we found that simple features derived from such graphs provide good performance. An analysis of the features used in this earlier work led us to explore the effectiveness of several representations of those graphs using the machine learning framework of graph kernels, which provide various ways of measuring similarity between graphs. Our results show that a graph kernel that evaluates the contribution of all paths in the graph has the best accuracy and that control flow information is more useful than data dependency information. The data used in this study are available for download at http://www.cs.colostate.edu/saxs/MRpred/functions.tar.gz to help researchers in further development of metamorphic relation prediction methods.
Article
The test oracle problem is regarded as one of the most challenging problems in software testing. Metamorphic testing has been developed to alleviate this problem, which is done using the relations involving relevant inputs and their outputs. This keynote speech will provide a summary of the state-of-the-art of metamorphic testing.
Article
Several module and class testing techniques have been applied to object-oriented (OO) programs, but researchers have only recently begun developing test criteria that evaluate the use of key OO features such as inheritance, polymorphism, and encapsulation. Mutation testing is a powerful testing technique for generating software tests and evaluating the quality of software. However, the cost of mutation testing has traditionally been so high that it cannot be applied without full automated tool support. This paper presents a method to reduce the execution cost of mutation testing for OO programs by using two key technologies, mutant schemata generation (MSG) and bytecode translation. This method adapts the existing MSG method for mutants that change the program behaviour and uses bytecode translation for mutants that change the program structure. A key advantage is in performance: only two compilations are required and both the compilation and execution time for each is greatly reduced. A mutation tool based on the MSG/bytecode translation method has been built and used to measure the speedup over the separate compilation approach. Experimental results show that the MSG/bytecode translation method is about five times faster than separate compilation. Copyright © 2004 John Wiley & Sons, Ltd.
Conference Paper
The correctness of mission-critical software is an important part of information security and oracle problem[1] is often a great constraint for their testing. Metamorphic testing(MT) is practical for oracle problem, but calls for more executions and only focuses on program's mathematics properties in most situations. This article provides the Path-Combination-Based MT method, which mines the relationships among inputs that could execute different paths and their corresponding outputs based on the analysis of program structure, and then tests the program with these relationships. The experimental results prove its efficiency.
Conference Paper
The information contained in the successful test case has been fully tapped by metamorphic testing which can effectively solve the oracle problem of software testing. One of the key factors affecting the results of the metamorphic testing is the generation of test cases. In this paper, we propose a criterion called ECCEM (Equivalence-Class Coverage for Every Metamorphic Relation), which covers the test cases based on equivalence classes, the criterion can availably generate fewer test case sets with high detection rate. This paper also proposes a new measure of test cases - the Test Case Rate of utilization (TCR), which can comprehensively assess the generated test suite.