Article

Path-directed source test case generation and prioritization in metamorphic testing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Metamorphic testing is a technique that makes use of some necessary properties of the software under test, termed as metamorphic relations, to construct new test cases, namely follow-up test cases, based on some existing test cases, namely source test cases. Due to the ability of verifying testing results without the need of test oracles, it has been widely used in many application domains and detected lots of real-life faults. Numerous investigations have been conducted to further improve the effectiveness of metamorphic testing, most of which were focused on the identification and selection of “good” metamorphic relations. Recently, a few studies emerged on the research direction of how to generate and select source test cases that are effective in fault detection. In this paper, we propose a novel approach to generating source test cases based on their associated path constraints, which are obtained through symbolic execution. The path distance among test cases is leveraged to guide the prioritization of source test cases, which further improve the efficiency. A tool has been developed to automate the proposed approach as much as possible. Empirical studies have also been conducted to evaluate the fault-detection effectiveness of the approach. The results show that this approach enhances both the performance and automation of metamorphic testing. It also highlights interesting research directions for further improving metamorphic testing.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Despite the testing challenge, Metamorphic Testing (MT) is a widely used technique to alleviate the test oracle problem [6][7][8][9][10][11]. Instead of verifying the outputs of software programs, MT validates the expected relationships between inputs and outputs of multiple executions of the software and these relationships are called Metamorphic Relations (MRs). ...
... However, USDPs do not have a ground truth to refer to, which is like a test oracle problem to validate the USDPs. MT is a widely used approach to address the test oracle problem [6][7][8][9][10][11]. ...
... These "expected" relationships are expressed as metamorphic relations (MRs), and they are the core element of MT. If the output results in various software executions violate an MR, then a fault is revealed [6,8,11,19]. So, by examining these expected relationships between inputs and outputs, which are MRs, we can validate the program under testing and avoid the test oracle problem. ...
Article
Full-text available
In the realm of software defect prediction, unsupervised models often step in when labelled datasets are scarce, despite facing the challenge of validating models without prior knowledge of data. Addressing this, we proposed a novel approach leveraging generic metamorphic testing to validate such models effectively, bypassing the need for expert-derived metamorphic relations. Our method identifies five categories of generic metamorphic relations, further divided into twenty-one individual generic metamorphic relations, all formulated through generic Data Mutation Operators. This framework enables us to generate follow-up datasets from the source datasets, training respective software defect prediction models. By comparing predictions between the source and follow-up software defect prediction models and identifying inconsistencies, we can assess the model’s sensitivity to generic metamorphic relations as a form of validation. This approach was rigorously evaluated across twenty software defect prediction models, incorporating fourteen different machine learning algorithms and twenty high-dimensional public datasets. Remarkably, the robustness of our generic MT model was confirmed, showing substantial effectiveness in validating software defect prediction models, independent of whether they were supervised or unsupervised. Software defect prediction models, using Agglomerative clustering and Density-Based Spatial Clustering of Applications with Noise algorithms, did not violate any metamorphic relation, and nineteen software defect prediction models did not significantly violate the generic metamorphic relation "Shrinkage and Expansion". Our findings suggest that employing generic metamorphic relations, especially "Shrinkage and Expansion", can universally enhance the validation of defect prediction models. Furthermore, our model can be applied to continuously monitor software defect prediction models.
... • MathUtil (MATH). This program provides miscellaneous utility functions that address common mathematical problems such as the calculation of the greatest common divisor of two numbers, the normalization of an angle, and the distance calculation between two points [47]. of type conversion rules [47]. • Dnapars (DNA). ...
... • MathUtil (MATH). This program provides miscellaneous utility functions that address common mathematical problems such as the calculation of the greatest common divisor of two numbers, the normalization of an angle, and the distance calculation between two points [47]. of type conversion rules [47]. • Dnapars (DNA). ...
... For the programs in SIR repository [46], such as Tcas and PT, we directly used the mutants released in the repository. For the other seven subject programs, we collected the mutants from their previous MT-related studies [7], [19], [44], [45], [47]. Note that we eliminated the mutants that could not be compiled successfully. ...
Article
Metamorphic testing, thanks to its high failure-detection effectiveness especially in the absence of test oracle, has been widely applied in both the traditional context of software testing and other relevant fields such as fault localization and program repair. Its core element is a set of metamorphic relations, which are the necessary properties of the target algorithm in the form of the relationships among multiple inputs and corresponding expected outputs. When a relation is violated by the outputs of a group of test cases, namely metamorphic group of test cases, that are constructed based on the relation, a failure is said to be revealed. Traditionally, the primary task of software testing is to reveal failures. Therefore, from the perspective of software testing, it may not need to know which test case(s) in the metamorphic group cause the violation and thus the failure. However, such information is definitely helpful for other software engineering activities, such as software debugging. The current literature of metamorphic testing lacks a systematic mechanism of identifying the actual failure-revealing test cases, which hinders its applicability and effectiveness in other relevant fields. In this paper, we propose a new technique for the FAILure-revealing Test case Identification in Metamorphic testing, namely FAILTIM. The approach is based on a novel application of statistical methods. More specifically, we leverage and adapt the basic ideas of spectrum-based techniques, which are originally used in fault localization, and propose the utilization of a set of risk formulas to estimate the suspiciousness of each individual test case in metamorphic groups. Failure-revealing test cases are then suggested according to their suspiciousness. A series of experiments have been conducted to evaluate the effectiveness and efficiency of FAILTIM using nine subject programs and 30 risk formulas. The experimental results showed that the new approach can achieve a high accuracy in identifying the actual failure-revealing test cases in metamorphic testing. Consequently, our study will help boost the applicability and performance of metamorphic testing beyond testing to other software engineering areas. The present work also unfolds a number of research directions for further advancing the theory of metamorphic testing and more broadly, software testing.
... [cs.SE] 30 Dec 2024 amined the characteristic of effective MRs and proposed some qualitative guidelines for selecting effective MRs [19], [20]. Apart from the identification of MRs, a variety of source test input generation approaches have been proposed [21], [22], and studies have investigated their impact on the fault detection effectiveness of MT [23], [24]. ...
... Another important factor that strongly influences the effectiveness of MT is the source test inputs. Accordingly, researchers have proposed various techniques to generate effective source test cases: (1) constraint solving-based techniques [21], [22] aim to generate test cases that cover different program paths; (2) adaptive random testing-based techniques [23] generate source test cases with a high degree of diversity; (3) iterative metamorphic testing techniques [67] employ the follow-up inputs as new source inputs to iteratively expand the number of source inputs; (4) equivalent class based technique [56]. At the same time, researchers have also compared the influences of different source input generation strategies to the effectiveness of MT [68]. ...
Preprint
Metamorphic testing (MT) is a simple yet effective technique to alleviate the oracle problem in software testing. The underlying idea of MT is to test a software system by checking whether metamorphic relations (MRs) hold among multiple test inputs (including source and follow-up inputs) and the actual output of their executions. Since MRs and source inputs are two essential components of MT, considerable efforts have been made to examine the systematic identification of MRs and the effective generation of source inputs, which has greatly enriched the fundamental theory of MT since its invention. However, few studies have investigated the test adequacy assessment issue of MT, which hinders the objective measurement of MT's test quality as well as the effective construction of test suites. Although in the context of traditional software testing, there exist a number of test adequacy criteria that specify testing requirements to constitute an adequate test from various perspectives, they are not in line with MT's focus which is to test the software under testing (SUT) from the perspective of necessary properties. In this paper, we proposed a new set of criteria that specifies testing requirements from the perspective of necessary properties satisfied by the SUT, and designed a test adequacy measurement that evaluates the degree of adequacy based on both MRs and source inputs. The experimental results have shown that the proposed measurement can effectively indicate the fault detection effectiveness of test suites, i.e., test suites with increased test adequacy usually exhibit higher effectiveness in fault detection. Our work made an attempt to assess the test adequacy of MT from a new perspective, and our criteria and measurement provide a new approach to evaluate the test quality of MT and provide guidelines for constructing effective test suites of MT.
... Experiments would compare the fault detection effectiveness of MR sets selected using this approach to those selected randomly or using other criteria. Sun et al. [38] provided path-directed source test case generation, and prioritization in metamorphic testing focuses on improving the efficiency and effectiveness of metamorphic testing by leveraging path information. This approach generates source test cases by considering the program's control flow, aiming to increase code coverage. ...
... In contrast, our work prioritizes metamorphic relations based on the diversity between the source and follow-up test cases in an MR. Sun et al. [38] proposed a technique to generate good source test cases in metamorphic testing based on constraint solvers and symbolic execution techniques. In addition, a prioritization of source test cases was conducted on the basis of path distances among test cases. ...
Article
Full-text available
Metamorphic testing is a valuable approach to verifying machine learning programs where traditional oracles are unavailable or difficult to apply. This paper proposes a technique to prioritize metamorphic relations (MRs) in metamorphic testing for machine learning and deep learning systems, aiming to enhance early fault detection. We introduce five metrics based on diversity in source and follow-up test cases to prioritize MRs. The effectiveness of our proposed prioritization methods is evaluated on three machine learning and one deep learning algorithm implementation. We compare our approach against random-based, fault-based, and neuron activation coverage-based MR ordering. The results show that our data diversity-based prioritization performs comparably to fault-based prioritization, reducing fault detection time by up to 62% compared to random MR execution. Our proposed metrics outperformed neuron activation coverage-based prioritization, providing 5-550% higher fault detection effectiveness. Overall, our approach to prioritizing metamorphic relations leads to increased fault detection effectiveness and reduced average fault detection time. This improvement in efficiency can result in significant time and cost savings when applying metamorphic testing to machine learning and deep learning systems.
... targeted test input) generation is to automatically generate test inputs with a given target [19,39]. Targeted inputs are widely used in software engineering tasks such as bug reproducation (target is the bug location) [4,10], test suite argument (target is the code to cover) [40,51] and combing with other testing tools to get better overall performance [19,33]. Constraint-based Approaches. ...
... It is an inventive strategy for systematic derivation of new test cases, thereby making substantive combination to the progression of software testing techniques. Sun et al. [36] have explored this testing approach to construct follow-up test cases from existing source test cases which are generated associated path constraints symbolic execution. The path distance among test cases guided the prioritization of source test cases, hence enhancing the efficiency of the software products. ...
Article
Software testing is one of the integral activities during development of software products. Generation and selection of the test cases either in static or dynamic form play pivot role for ensuring the quality of software products. There are numerous approaches in the literature for automatic generation of test cases but coverage criteria and fault detection rate are prominent metrics for checking the effectiveness of the software products during testing phase of software development. In the present, a new Harmony Radial Testing (HRT) is proposed by combining the concepts of Harmony Search Algorithm (HSA) and Radial Basis Function-Neural Network (RBF-NN) approaches. The main objective of the proposed HRT method is to generate automatic test cases by considering the criteria of branch coverage with improvement in the Maximum branch Coverage (MaxC), Average Coverage (AC) and Average Percentage Fault Detection (APFD) rates. The proposed approach combined with the Radial Basis Function (RBF), denoted as a HRT approach. The proposed approach is used to optimize harmony search over the randomly selected sample test cases, training the RBF-NN to simulate the fitness function. Seven Python codes have been tested through proposed approach and computed results are compared with Primal-Dual Genetic Algorithm (PDGA), Simple Genetic Algorithm (SGA) and random methods. It is observed that the proposed HRT algorithm optimizes consistently yielded reliable results, which may be used in future for enriching the software testing process by the software industries.
... These programs span a diverse range of application domains, including scientific computing, aircraft conflict detection, lexical parsing, string transformation, and DNA data processing. They have been widely utilized in previous studies related to MT,9,10 vary in size, and are written in various programming languages. To evaluate the effectiveness of the MFT method in tolerating the faults in software systems, faulty versions are required for each subject program. ...
Article
We present the first comprehensive framework for implementing metamorphic fault tolerance. A key innovation in this framework is the application of risk formulas to evaluate the likelihood of inputs producing trustworthy outputs. Experiments demonstrate its effectiveness, especially without the need of test oracles.
... In literature, source test cases are generated either through some traditional test case generation techniques as discussed earlier or through some tool such as EvoSuite (it generates source test cases automatically through coverage criterion) [5]. Nowadays, few researches are emerged in the direction of generation and selection of source test cases that are effective in fault detection [24]. ...
Article
Full-text available
Testing an intricate plexus of advanced software system architecture is quite challenging due to the absence of test oracle. Metamorphic testing is a popular technique to alleviate the test oracle problem. The effectiveness of metamorphic testing is dependent on metamorphic relations (MRs). MRs represent the essential properties of the system under test and are evaluated by their fault detection rates. The existing techniques for the evaluation of MRs are not comprehensive, as very few mutation operators are used to generate very few mutants. In this research, we have proposed six new MRs for dilation and erosion operations. The fault detection rate of six newly proposed MRs is determined using mutation testing. We have used eight applicable mutation operators and determined their effectiveness. By using these applicable operators, we have ensured that all the possible numbers of mutants are generated, which shows that all the faults in the system under test are fully identified. Results of the evaluation of four MRs for edge detection show an improvement in all the respective MRs, especially in MR1 and MR4, with a fault detection rate of 76.54% and 69.13%, respectively, which is 32% and 24% higher than the existing technique. The fault detection rate of MR2 and MR3 is also improved by 1%. Similarly, results of dilation and erosion show that out of 8 MRs, the fault detection rates of four MRs are higher than the existing technique. In the proposed technique, MR1 is improved by 39%, MR4 is improved by 0.5%, MR6 is improved by 17%, and MR8 is improved by 29%. We have also compared the results of our proposed MRs with the existing MRs of dilation and erosion operations. Results show that the proposed MRs complement the existing MRs effectively as the new MRs can find those faults that are not identified by the existing MRs.
... Random adaptive testing is based on the idea that non-failing inputs tend to form contiguous failure regions, and therefore non-failing inputs should also produce contiguous regions. [68]. ...
Article
Full-text available
Increasing the effectiveness of programming education has emerged as an important goal in teaching programming languages in the last decade. Automatic evaluation of the correctness of the student’s source code saves teachers time and effort and allows a more comprehensive focus on the preparation of assignments with integrated feedback. The study aims to present an approach that will enable effective testing of students’ source codes within object-oriented programming courses while minimising the demands on teachers when preparing the assignment. This approach also supports variability in testing and preventing student cheating. Based on the principles of different types of testing (black-box, white-box, grey-box), an integrated solution for source code verification was designed and verified. The basic idea is to use a reference class, which is assumed to be part of every assignment, as the correct solution. This reference class is compared to the student solution using the grey-box method. Due to their identical interface (defined by assignment), comparing instance states and method outputs is a matter of basic programming language mechanisms. A significant advantage is that a random generation of test cases can be used in such a case, while the rules for their generation can be determined using simple formulas. The proposed procedure was implemented and gradually improved over 4 years on groups of bachelor students of applied informatics with a high level of acceptance.
... A considerable amount of research has been conducted to improve regression testing performance on various issues [29,62,63,64,65,66,67]. We focus on the coverage-based TCP techniques and summarize the existing work from the following categories. ...
Preprint
Full-text available
Test case prioritization (TCP) aims to reorder the regression test suite with a goal of increasing the fault detection rate. Various TCP techniques have been proposed based on different prioritization strategies. Among them, the greedy-based techniques are the most widely-used TCP techniques. However, existing greedy-based techniques usually reorder all candidate test cases in prioritization iterations, resulting in both efficiency and effectiveness problems. In this paper, we propose a generic partial attention mechanism, which adopts the previous priority values (i.e., the number of additionally-covered code units) to avoid considering all candidate test cases. Incorporating the mechanism with the additional-greedy strategy, we implement a novel coverage-based TCP technique based on partition ordering (OCP). OCP first groups the candidate test cases into different partitions and updates the partitions on the descending order. We conduct a comprehensive experiment on 19 versions of Java programs and 30 versions of C programs to compare the effectiveness and efficiency of OCP with six state-of-the-art TCP techniques: total-greedy, additional-greedy, lexicographical-greedy, unify-greedy, art-based, and search-based. The experimental results show that OCP achieves a better fault detection rate than the state-of-the-arts. Moreover, the time costs of OCP are found to achieve 85%-99% improvement than most state-of-the-arts.
Article
Metamorphic testing (MT) is effective in detecting software failures; it detects failures by examining the metamorphic relations (MRs) among source test cases (STCs), follow‐up test cases (FTCs) and their respective outputs. The STCs together with the corresponding FTCs, considered as a whole, are called metamorphic groups (MGs). MT performance relies heavily on the MRs and MGs. Previous studies have mainly focused on improving MT performance by identifying effective MRs, or through generation of MGs with high quality, but have somewhat neglected the selection of MRs and MGs from existing ones. In this paper, we address this issue by introducing a new metric for guiding the selection of effective MR‐MG pairs from a new perspective: The MR‐MG pair is chosen such that the MR makes the current MG as far away as possible from the executed MGs. We design an MR‐MG pair selection algorithm, named metamorphic relation and group selection based on adaptive random testing (MRGS‐ART), to implement our metric. The intuition behind MRGS‐ART is that we attempt to improve MT performance by achieving an even distribution of STCs and FTCs in their corresponding input domains for all the MRs used. Experimental results indicate that MRGS‐ART can enhance MT performance. We believe that this is the first comprehensive and systematic demonstration, from the perspective of both MRs and MGs, that making STCs and FTCs evenly distributed in their corresponding input domains can improve MT performance. Finally, by analysing the experimental results, we provide guidance on how to most effectively implement MRGS‐ART.
Article
Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., vulnerability detection, remain difficult to apply in practice. Specifically, though previous work has demonstrated the potential of metamorphic testing-security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs-metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black-box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines involving standard search approaches. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving the same level of vulnerability detection. Furthermore, AIM significantly outperformed all the considered baselines regarding vulnerability coverage.</p
Article
The DHR architecture provides a revolutionary security defense structure for cyberspace. The multimode ruling in DHR is expected to alleviate the oracle problem, which still suffers from the existence of common model vulnerability. In this work, we design a test segmentation method to transform multimode ruling to a metamorphic testing problem. The text test input that causes inconsistency of heterogeneous executors is converted to a condition set, and we extract subsets of conditions based on its syntax tree. The original test can exploit a specific vulnerability, the follow‐up tests are composed by different subsets of conditions within the original test. We collect the execution matrix for the follow‐up tests to analyse the impact of each subset of conditions on ruling decision. Metamorphic relations are extracted based on the localization of independent condition, that is, the subsets of conditions that can impact ruling decision independently. The executors in an inconsistent ruling should be examined with metamorphic testing methods, rather than traditional majority voting mechanism. The proposed test segmentation and improved multimode ruling methods are evaluated on two DHR‐based cases, SQL injection in cyber‐range system and deserialization attack in ‐ project. The experimental results show that our test segmentation can help to locate malicious expressions and the metamorphic testing‐based multimode ruling can generate more correct results than majority voting mechanism with an average 15.8% performance loss.
Article
Testing is one of the most time‐consuming and unpredictable processes within the software development life cycle. As a result, many test case optimization (TCO) techniques have been proposed to make this process more scalable. Object Constraint Language (OCL) was initially introduced as a constraint language to provide additional details to Unified Modeling Language models. However, as OCL continues to evolve, an increasing number of systems are being expressed by this language. Despite this growth, a noticeable research gap exists for the testing of systems whose specifications are expressed in OCL. In our previous work, we verified the effectiveness and efficiency of performing the test case prioritization (TCP) process for these systems. In this study, we extend our previous work by integrating the test case minimization (TCM) process to determine whether TCM can also benefit the testing process under the context of OCL. The evaluation of TCO approaches often relies on well‐established metrics such as the average percentage of fault detection (APFD). However, the suitability of APFD for model‐based testing (MBT) is not ideal. This paper addresses this limitation by proposing a modification to the APFD metric to enhance its viability for MBT scenarios. We conducted four case studies to evaluate the feasibility of integrating the TCM and TCP processes into our proposed approach. In these studies, we applied the multi‐objective optimization algorithm NSGA‐II and the genetic algorithm independently to the TCM and TCP processes. The objective was to assess the effectiveness and efficiency of combining TCM and TCP in enhancing the testing phase. Through experimental analysis, the results highlight the benefits of integrating TCM and TCP in the context of OCL‐based testing, providing valuable insights for practitioners and researchers aiming to optimize their testing efforts. Specifically, the main contributions of this work include the following: (1) we introduce the integration of the TCM process into the TCO process for systems expressed by OCL. This integration benefits the testing process further by reducing redundant test cases while ensuring sufficient coverage. (2) We comprehensively analyze the limitations associated with the commonly used metric, APFD, and then, a modified version of the APFD metric has been proposed to overcome these weaknesses. (3). We systematically evaluate the effectiveness and efficiency of OCL‐based TCO processes on four real‐world case studies with different complexities.
Article
Recent advances in artificial intelligence technology and perception components have promoted the rapid development of autonomous vehicles. However, as safety‐critical software, autonomous driving systems often make wrong judgments, seriously threatening human and property safety. LiDAR is one of the most critical sensors in autonomous vehicles, capable of accurately perceiving the three‐dimensional information of the environment. Nevertheless, the high cost of manually collecting and labeling point cloud data leads to a dearth of testing methods for LiDAR‐based perception modules. To bridge the critical gap, we introduce MetaLiDAR, a novel automated metamorphic testing methodology for LiDAR‐based autonomous driving systems. First, we propose three object‐level metamorphic relations for the domain characteristics of autonomous driving systems. Next, we design three transformation modules so that MetaLiDAR can generate natural‐looking follow‐up point clouds. Finally, we define corresponding evaluation metrics based on metamorphic relations. MetaLiDAR automatically determines whether source and follow‐up test cases meet the metamorphic relations based on the evaluation metrics. Our empirical research on five state‐of‐the‐art LiDAR‐based object detection models shows that MetaLiDAR can not only generate natural‐looking test point clouds to detect 181,547 inconsistent behaviors of different models but also significantly enhance the robustness of models by retraining with synthetic point clouds.
Chapter
Plagiarism is a severe issue in academia, and uncertainty in plagiarism detection systems might lead to inconsistent detections. Thus, evaluating the system is essential; however, it is also a test oracle problem as it is challenging to distinguish correct behaviour from potentially incorrect behaviour of the system. To alleviate this challenge, we develop a feasible approach by applying an uncertainty matrix to identify the uncertainty of the plagiarism detection systems and derive metamorphic relations of metamorphic testing from the identified uncertainty for validation. We experimented with three plagiarism detection systems in a classroom scenario where students were hypothesized to use tools to generate answers for assignments. These answers were fed into the systems for validation by comparing the systems’ similarity scores of the tool-generated answers. Results showed that the proposed approach can effectively validate plagiarism detection systems. Future studies can apply this approach to locate uncertainties to enhance systems’ robustness.
Article
Metamorphic testing (MT) is an effective technique to alleviate the test oracle problem. The principle of MT is to detect failures by checking whether some necessary properties, commonly known as metamorphic relations (MRs), of software under test (SUT) hold among multiple executions of source and follow‐up test cases. Since both the generation of follow‐up test cases and test result verification depend on MRs, the identification of MRs plays a key role in MT, which is an important yet difficult task requiring deep domain knowledge of the SUT. Accordingly, techniques that can direct a tester to identify MRs effectively are desirable. In this paper, we propose MT, a data mutation directed approach to identifying MRs. MT guides a tester to identify MRs by providing a set of data mutation operators and template‐style mapping rules, which not only alleviates the difficulties faced in the process of MR identification but also improves the identification effectiveness. We have further developed a tool to implement the proposed approach and conducted an empirical study to evaluate the MR identification effectiveness of MT and the performance of MRs identified by MT with respect to fault detection capability and statement coverage. The empirical results show that MT is able to identify MRs for numeric programs effectively, and the identified MRs have high fault detection capability and statement coverage. The work presented in this paper advances the field of MT by providing a simple yet practical approach to the MR identification problem.
Article
Test case prioritization (TCP) aims to reorder the regression test suite with a goal of increasing the fault detection rate. Various TCP techniques have been proposed based on different prioritization strategies. Among them, the greedy-based techniques are the most widely-used TCP techniques. However, existing greedy-based techniques usually reorder all candidate test cases in prioritization iterations, resulting in both efficiency and effectiveness problems. In this paper, we propose a generic partial attention mechanism, which adopts the previous priority values (i.e., the number of additionally-covered code units) to avoid considering all candidate test cases. Incorporating the mechanism with the additional-greedy strategy, we implement a novel coverage-based TCP technique based on partition ordering (OCP). OCP first groups the candidate test cases into different partitions and updates the partitions on the descending order. We conduct a comprehensive experiment on 19 versions of Java programs and 30 versions of C programs to compare the effectiveness and efficiency of OCP with six state-of-the-art TCP techniques: total-greedy, additional-greedy, lexicographical-greedy, unify-greedy, art-based, and search-based. The experimental results show that OCP achieves a better fault detection rate than the state-of-the-arts. Moreover, the time costs of OCP are found to achieve 85%–99% improvement than most state-of-the-arts.
Article
Full-text available
Metamorphic testing is well known for its ability to alleviate the oracle problem in software testing. The main idea of metamorphic testing is to test a software system by checking whether each identified metamorphic relation (MR) holds among several executions. In this regard, identifying MRs is an essential task in metamorphic testing. In view of the importance of this identification task, METRIC (METamorphic Relation Identification based on Category-choice framework) was developed to help software testers identify MRs from a given set of complete test frames. However, during MR identification, METRIC primarily focuses on the input domain without sufficient attention given to the output domain, thereby hindering the effectiveness of METRIC. Inspired by this problem, we have extended METRIC into METRIC+ by incorporating the information derived from the output domain for MR identification. A tool implementing METRIC+ has also been developed. Two rounds of experiments, involving four real-life specifications, have been conducted to evaluate the effectiveness and efficiency of METRIC+. The results have confirmed that METRIC+ is highly effective and efficient in MR identification. Additional experiments have been performed to compare the fault detection capability of the MRs generated by METRIC+ and those by mMT (another MR identification technique). The comparison results have confirmed that the MRs generated by METRIC+ are highly effective in fault detection.
Conference Paper
Full-text available
Searching and displaying data based on user queries is a key feature of most software applications such as information systems, web portals, web APIs, and data analytic platforms. The large volume of data managed by these types of systems, henceforth called query-based systems (QBS), makes them extremely hard to test due to the difficulty to assess whether the output of a query is correct, the so-called oracle problem. Metamorphic testing has proved to be a very effective approach to alleviate the oracle problem in QBS, enabling the detection of bugs in data repositories, large e-commerce sites, and some of the most used software applications worldwide such as Google Search and YouTube. We have observed, however, that the metamorphic relations used to test different types of QBS are very similar, regardless of their domain, since all of them exploit standard query features such as filtering and ordering. Inspired by this finding, in this paper we present a catalogue of metamorphic relation patterns to assist testers in the identification and inference of metamorphic relations in QBS. For the definition of the patterns we resorted to the root of most query languages: relational algebra. We show how the proposed patterns help in the identification of metamorphic relations in the e-commerce platform PrestaShop, the email service Gmail, and the mobile application of video streaming HBO.
Article
Full-text available
Metamorphic testing can test untestable software, detecting fatal errors in autonomous vehicles' onboard computer systems.
Conference Paper
Full-text available
Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing on these methods.
Article
Full-text available
Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing on these methods.
Article
Full-text available
Metamorphic testing is an approach to both test case generation and test result verification. A central element is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, and the validation and evaluation of software systems. In this article, we review the current research of metamorphic testing and discuss the challenges yet to be addressed. We also present visions for further improvement of metamorphic testing and highlight opportunities for new research.
Article
Full-text available
Web Application Programming Interfaces (APIs) allow systems to interact with each other over the network. Modern Web APIs often adhere to the REST architectural style, being referred to as RESTful Web APIs. RESTful Web APIs are decomposed into multiple resources (e.g., a video in the YouTube API) that clients can manipulate through HTTP interactions. Testing Web APIs is critical but challenging due to the difficulty to assess the correctness of API responses, i.e., the oracle problem. Metamorphic testing alleviates the oracle problem by exploiting relations (so-called metamorphic relations) among multiple executions of the program under test. In this paper, we present a metamorphic testing approach for the detection of faults in RESTful Web APIs. We first propose six abstract relations that capture the shape of many of the metamorphic relations found in RESTful Web APIs, we call these Metamorphic Relation Output Patterns (MROPs). Each MROP can then be instantiated into one or more concrete metamorphic relations. The approach was evaluated using both automatically seeded and real faults in six subject Web APIs. Among other results, we identified 60 metamorphic relations (instances of the proposed MROPs) in the Web APIs of Spotify and YouTube. Each metamorphic relation was implemented using both random and manual test data, running over 4.7K automated tests. As a result, 11 issues were detected (3 in Spotify and 8 in YouTube), 10 of them confirmed by the API developers or reproduced by other users, supporting the effectiveness of the approach.
Article
Full-text available
Diversity has been widely studied in software testing as a guidance towards effective sampling of test inputs in the vast space of possible program behaviors. However, diversity has received relatively little attention in mutation testing. The traditional mutation adequacy criterion is a one-dimensional measure of the total number of killed mutants. We propose a novel, diversity-aware mutation adequacy criterion called distinguishing mutation adequacy criterion, which is fully satisfied when each of the considered mutants can be identified by the set of tests that kill it, thereby encouraging inclusion of more diverse range of tests. This paper presents the formal definition of the distinguishing mutation adequacy and its score. Subsequently, an empirical study investigates the relationship among distinguishing mutation score, fault detection capability, and test suite size. The results show that the distinguishing mutation adequacy criterion detects 1.33 times more unseen faults than the traditional mutation adequacy criterion, at the cost of a 1.56 times increase in test suite size, for adequate test suites that fully satisfies the criteria. The results show a better picture for inadequate test suites; on average, 8.63 times more unseen faults are detected at the cost of a 3.14 times increase in test suite size.
Article
Full-text available
Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The survey has been accepted for publication at ACM Computing Surveys, and this is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fvc
Article
Full-text available
A test oracle determines whether a test execution reveals a fault, often by comparing the observed program output to the expected output. This is not always practical, for example when a program's input-output relation is complex and difficult to capture formally. Metamorphic testing provides an alternative, where correctness is not determined by checking an individual concrete output, but by applying a transformation to a test input and observing how the program output 'morphs' into a different one as a result. Since the introduction of such metamorphic relations in 1998, many contributions on metamorphic testing have been made, and the technique has seen successful applications in a variety of domains, ranging from web services to computer graphics. This article provides a comprehensive survey on metamorphic testing: It summarises the research results and application areas, and analyses common practice in empirical studies of metamorphic testing as well as the main open challenges.
Article
Full-text available
Metamorphic testing is a promising technique for testing software systems when the oracle problem exists, and has been successfully applied to various application domains and paradigms. An important and essential task in metamorphic testing is the identification of metamorphic relations, which, due to the absence of a systematic and specification-based methodology, has often been done in an ad hoc manner—something which has hindered the applicability and effectiveness of metamorphic testing. To address this, a systematic methodology for identifying metamorphic relations based on the category-choice framework, called metric, is introduced in this paper. A tool implementing this methodology has been developed and examined in an experiment to determine the viability and effectiveness of metric, with the results of the experiment confirming that metric is both effective and efficient at identifying metamorphic relations.
Article
Full-text available
There are two fundamental limitations in software testing, known as the reliable test set problem and the oracle problem. Fault-based testing is an attempt by Morell to alleviate the reliable test set problem. In this paper, we propose to enhance fault-based testing to alleviate the oracle problem as well. We present an integrated method that combines metamorphic testing with fault-based testing using real and symbolic inputs.
Conference Paper
Full-text available
Mutation testing is a valuable experimental research technique that has been used in many studies. It has been experimentally compared with other test criteria, and also used to support experimental comparisons of other test criteria, by using mutants as a method to create faults. In effect, mutation is often used as a "gold standard" for experimental evaluations of test methods. Although mutation testing is powerful, it is a complicated and computationally expensive testing method. Therefore, automated tool support is indispensable for conducting mutation testing. This demo presents a publicly available mutation system for Java that supports both method-level mutants and class-level mutants. MuJava can be freely downloaded and installed with relative ease under both Unix and Windows. MuJava is offered as a free service to the community and we hope that it will promote the use of mutation analysis for experimental research in software testing.
Conference Paper
Full-text available
Empirical studies in software testing research may not be comparable, reproducible, or characteristic of practice. One reason is that real bugs are too infrequently used in software testing research. Extracting and reproducing real bugs is challenging and as a result hand-seeded faults or mutants are commonly used as a substitute. This paper presents Defects4J, a database and extensible framework providing real bugs to enable reproducible studies in software testing research. The initial version of Defects4J contains 357 real bugs from 5 real-world open source pro- grams. Each real bug is accompanied by a comprehensive test suite that can expose (demonstrate) that bug. Defects4J is extensible and builds on top of each program’s version con- trol system. Once a program is configured in Defects4J, new bugs can be added to the database with little or no effort. Defects4J features a framework to easily access faulty and fixed program versions and corresponding test suites. This framework also provides a high-level interface to common tasks in software testing research, making it easy to con- duct and reproduce empirical studies. Defects4J is publicly available at http://defects4j.org.
Conference Paper
Full-text available
Metamorphic testing (MT) is a property-based automated software testing method. It alleviates the oracle problem by testing programs against metamorphic relations (MRs), which are necessary properties among multiple executions of the target program. For a given problem, usually more than one MR can be identified. It is therefore of practical importance for testers to know the nature of good MRs, that is, which MRs are likely to have higher chances of revealing failures. To address this issue we investigate the correlation between the fault-detection effectiveness of MRs and the dissimilarity (distance) of test case execution profiles. Empirical study results reveal that there is a strong and statistically significant positive correlation between the fault-detection effectiveness and the distance. The findings of this research can help to develop automated means of selecting/prioritizing MRs for cost-effective metamorphic testing.
Article
Full-text available
Symbolic PathFinder (SPF) is a software analysis tool that combines symbolic execution with model checking for automated test case generation and error detection in Java bytecode programs. In SPF, programs are executed on symbolic inputs representing multiple concrete inputs and the values of program variables are represented by expressions over those symbolic inputs. Constraints over these expressions are generated from the analysis of different paths through the program. The constraints are solved with off-the-shelf solvers to determine path feasibility and to generate test inputs. Model checking is used to explore different symbolic program executions, to systematically handle aliasing in the input data structures, and to analyze the multithreading present in the code. SPF incorporates techniques for handling input data structures, strings, and native calls to external libraries, as well as for solving complex mathematical constraints. We describe the tool and its application at NASA, in academia, and in industry.
Article
Full-text available
This paper presents an integrated metamorphic testing environment MTest and reports an experimental analysis of the effectiveness of metamorphic testing, which is carried out using MTest with a real program of sparse matrix multiplication. Quantitative evaluation and comparison of special case testing, metamorphic testing with special and random test cases are illustrated with two measurements: mutation score and fault detection ratio. The case study shows that metamorphic testing and special case testing are complementary to each other, and with
Article
Full-text available
We present an integrated method for program proving, testing, and debugging. Using the concept of metamorphic relations, we select necessary properties for target programs. For programs where global symbolic evaluation can be conducted and the constraint expressions involved can be solved, we can either prove that these necessary conditions for program correctness are satisfied or identify all inputs that violate the conditions. For other programs, our method can be converted into a symbolic-testing approach. Our method extrapolates from the correctness of a program for tested inputs to the correctness of the program for related untested inputs. The method supports automatic debugging through the identification of constraint expressions that reveal failures.
Conference Paper
Full-text available
Mutation testing is a valuable experimental research technique that has been used in many studies. It has been experimentally compared with other test criteria, and also used to support experimental comparisons of other test criteria, by using mutants as a method to create faults. In effect, mutation is often used as a ``gold standard'' for experimental evaluations of test methods. Although mutation testing is powerful, it is a complicated and computationally expensive testing method. Therefore, automated tool support is indispensable for conducting mutation testing. This demo presents a publicly available mutation system for Java that supports both method-level mutants and class-level mutants. MuJava can be freely downloaded and installed with relative ease under both Unix and Windows. MuJava is offered as a free service to the community and we hope that it will promote the use of mutation analysis for experimental research in software testing.
Conference Paper
Full-text available
CUTE, a Concolic Unit Testing Engine for C and Java, is a tool to systematically and automatically test sequential C programs (in- cluding pointers) and concurrent Java programs. CUTE combines con- crete and symbolic execution in a way that avoids redundant test cases as well as false warnings. The tool also introduces a race-flipping tech- nique to efficiently test and model check concurrent programs with data inputs.
Conference Paper
Full-text available
The problem of testing programs without test oracles is well known. A commonly used approach is to use special values in testing but this is often insufficient to ensure program correctness. This paper demonstrates the use of metamorphic testing to uncover faults in programs, which could not be detected by special test values. Metamorphic testing can be used as a complementary test method to special value testing. In this paper, the sine function and a search function are used as examples to demonstrate the usefulness of metamorphic testing. This paper also examines metamorphic relationships and the extent of their usefulness in program testing.
Conference Paper
Full-text available
Symbolic Pathfinder (SPF) combines symbolic execution with model checking and constraint solving for automated test case generation and error detection in Java programs with unspecified inputs. In this tool, programs are executed on symbolic inputs representing multiple concrete inputs. Values of variables are represented as constraints generated from the analysis of Java bytecode. The constraints are solved using off-the shelf solvers to generate test inputs guaranteed to achieve complex coverage criteria. SPF has been used successfully at NASA, in academia, and in industry.
Conference Paper
Full-text available
We present a new tool, named DART, for automatically testing software that combines three main techniques: (1) automated extraction of the interface of a program with its external environment using static source-code parsing; (2) automatic generation of a test driver for this interface that performs random testing to simulate the most general environment the program can operate in; and (3) dynamic analysis of how the program behaves under random testing and automatic generation of new test inputs to direct systematically the execution along alternative program paths. Together, these three techniques constitute Directed Automated Random Testing, or DART for short. The main strength of DART is thus that testing can be performed completely automatically on any program that compiles - there is no need to write any test driver or harness code. During testing, DART detects standard errors such as program crashes, assertion violations, and non-termination. Preliminary experiments to unit test several examples of C programs are very encouraging.
Article
Full-text available
A method for creating functional test suites has been developed in which a test engineer analyzes the system specification, writes a series of formal test specifications, and then uses a generator tool to produce test descriptions from which test scripts are written. The advantages of this method are that the tester can easily modify the test specification when necessary, and can control the complexity and number of the tests by annotating the tests specification with constraints.
Article
Full-text available
We present an integrated method for program proving, testing, and debugging. Using the concept of metamorphic relations, we select necessary properties for target programs. For programs where global symbolic evaluation can be conducted and the constraint expressions involved can be solved, we can either prove that these necessary conditions for program correctness are satisfied or identify all inputs that violate the conditions. For other programs, our method can be converted into a symbolic-testing approach. Our method extrapolates from the correctness of a program for tested inputs to the correctness of the program for related untested inputs. The method supports automatic debugging through the identification of constraint expressions that reveal failures. Index Terms—Software/program verification, symbolic execution, testing and debugging.
Article
Full-text available
Choco is a java library for constraint satisfaction problems (CSP), constraint programming (CP) and explanation-based constraint solving (e-CP). It is built on a event-based propagation mechanism with backtrackable structures.
Article
Full-text available
Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguided downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered. We propose to use a novel software testing technique, metamorphic testing (MT), to test a range of bioinformatics programs. Instead of requiring a mechanism to verify whether an individual test output is correct, the MT technique verifies whether a pair of test outputs conform to a set of domain specific properties, called metamorphic relations (MRs), thus greatly increases the number and variety of test cases that can be applied. To demonstrate how MT is used in practice, we applied MT to test two open-source bioinformatics programs, namely GNLab and SeqMap. In particular we show that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs. Further, we discuss how MT can be applied to test programs from various domains of bioinformatics. This paper describes the application of a simple, effective and automated technique to systematically test a range of bioinformatics programs. We show how MT can be implemented in practice through two real-life case studies. Since many bioinformatics programs, particularly those for large scale simulation and data analysis, are hard to test systematically, their developers may benefit from using MT as part of the testing strategy. Therefore our work represents a significant step towards software reliability in bioinformatics.
Conference Paper
Full-text available
An enhanced version of metamorphic testing, namely n-iterative metamorphic testing, is proposed to systematically exploit more information out of metamorphic tests by applying metamorphic relations in a chain style. A contrastive case study, conducted within an integrated testing environment MTest, shows that n-iterative metamorphic testing exceeds metamorphic testing and special case testing in terms of their fault detection capabilities. Another advantage of n-iterative metamorphic testing is its high efficiency in test case generation.
Conference Paper
Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California, Texas, and New York have passed new legislation to fast-track the process of testing and deployment of autonomous vehicles on their roads. However, despite their spectacular progress, DNNs, just like traditional software, often demonstrate incorrect or unexpected corner-case behaviors that can lead to potentially fatal collisions. Several such real-world accidents involving autonomous cars have already happened including one which resulted in a fatality. Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases. In this paper, we design, implement, and evaluate DeepTest, a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes. First, our tool is designed to automatically generated test cases leveraging real-world changes in driving conditions like rain, fog, lighting conditions, etc. DeepTest systematically explore different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons. DeepTest found thousands of erroneous behaviors under different realistic driving conditions (e.g., blurring, rain, fog, etc.) many of which lead to potentially fatal crashes in three top performing DNNs in the Udacity self-driving car challenge.
Article
Random testing and partition testing are two major families of software testing techniques. They have been compared both theoretically and empirically in numerous studies for decades, and it has been acknowledged that they have their own advantages and disadvantages and that their innate characteristics are fairly complementary to each other. In this paper, we propose a new testing approach, adaptive partition testing, where test cases are randomly selected from some partition whose probability of being selected is adaptively adjusted along the testing process. We particularly develop two algorithms, Markov-chain based adaptive partition testing and reward-punishment based adaptive partition testing. The former algorithm makes use of Markov matrix to dynamically adjust the probability of a partition to be selected for conducting tests; while the latter is based on a reward and punishment mechanism. We conduct empirical studies to evaluate the performance of the proposed algorithms using ten faulty versions of three large-scale open source programs. Our experimental results show that, compared with two baseline techniques, namely random partition testing (RPT) and dynamic random testing (DRT), our algorithms deliver higher fault-detection effectiveness with lower test case selection overhead. It is demonstrated that the proposed adaptive partition testing is an effective testing approach.
Conference Paper
Metamorphic testing uses domain-specific properties about a program's intended behaviour to alleviate the oracle problem. From a given set of source test inputs, a set of follow-up test inputs are generated which have some relation to the source inputs, and their outputs are compared to outputs from the source tests, using metamorphic relations. We evaluate the use of an automated test input generation technique called dynamic symbolic execution (DSE) to generate the source test inputs for metamorphic testing. We investigate whether DSE increases source-code coverage and fault finding effectiveness of metamorphic testing compared to the use of random testing, and whether the use of metamorphic relations as a supportive technique improves the test inputs generated by DSE. Our results show that DSE improves the coverage and fault detection rate of metamorphic testing compared to random testing using significantly smaller test suites, and the use of metamorphic relations increases code coverage of both DSE and random tests considerably, but the improvement in the fault detection rate may be marginal and depends on the used metamorphic relations.
Conference Paper
Metamorphic Testing (MT) aims to alleviate the oracle problem. In MT, testers define metamorphic relations (MRs) which are used to generate new test cases (referred to as follow-up test cases) from the available test cases (referred to as source test cases). Both source and follow-up test cases are executed and their outputs are verified against the relevant MRs, of which any violation implies that the software under test is faulty. So far, the research on the effectiveness of MT has been focused on the selection of better MRs (that is, MRs that are more likely to be violated). In addition to MR selection, the source and follow-up test cases may also affect the effectiveness of MT. Since follow-up test cases are defined by the source test cases and MRs, selection of source test cases will then affect the effectiveness of MT. However, in existing MT studies, random testing is commonly adopted as the test case selection strategy for source test cases. This study aims to investigate the impact of source test cases on the effectiveness of MT. Since Adaptive Random Testing (ART) has been developed as an enhancement to Random Testing (RT), this study will focus on comparing the performance of RT and ART as source test case selection strategies on the effectiveness of MT. Experiment results show that ART outperforms RT on enhancing the effectiveness of MT.
Conference Paper
When figuring out the expected output for each test case is difficult, metamorphic testing can be applied to alleviate such situations. An involved key challenge is to derive metamorphic relations for the program under test. This paper proposes a datamutation directed metamorphic relation acquisition methodology called μMT. Experimental results on three case studies show that μMT is feasible in deriving metamorphic relations for numeric applications and the derived metamorphic relations show reasonable fault detection effectiveness.
Article
Random testing (RT) has been widely used in the testing of various software and hardware systems. Adaptive random testing (ART) is a family of random testing techniques that aim to enhance the failure-detection effectiveness of RT by spreading random test cases evenly throughout the input domain. ART has been empirically shown to be effective on software with numeric inputs. However, there are two aspects of ART that need to be addressed to render its adoption more widespread - applicability to programs with non-numeric inputs, and the high computation overhead of many ART algorithms. We present a linear-order ART algorithm for software with non-numeric inputs. The key requirement for using ART with non-numeric inputs is an appropriate "distance" measure. We use the concepts of categories and choices from category-partition testing to formulate such a measure. We investigate the failure-detection effectiveness of our technique by performing an empirical study on 14 object programs, using two standard metrics - F-measure and P-measure. Our ART algorithm statistically significantly outperforms RT on 10 of the 14 programs studied, and exhibits performance similar to RT on three of the four remaining programs. The selection overhead of our ART algorithm is close to that of RT.
Article
The use of web services has been growing significantly, with increasingly large numbers of applications being implemented through the web. A difficulty associated with this development is the quality assurance of these services, specifically the challenges encountered when testing the applications - amongst other things, testers may not have access to the source code, and the correctness of the output may not be easily ascertained (known as the oracle problem). Metamorphic testing (MT) has been introduced as a technique to alleviate the oracle problem. MT makes use of properties of the software under test, known as metamorphic relations, and checks whether or not these relations are violated. Since MT does not require source code to generate the metamorphic relations, it is suitable for testing web services-based applications. We have designed an XML-based language representation to facilitate the formalisation of metamorphic relations, the generation of (follow-up) test cases, and the verification of the test results. Based on this, we have also developed a tool to support the automation of MT for web service applications. This tool has been used in an experiment to test web services, the evaluation of which is reported in this paper.
Article
We introduce equivalence modulo inputs (EMI), a simple, widely applicable methodology for validating optimizing compilers. Our key insight is to exploit the close interplay between (1) dynamically executing a program on some test inputs and (2) statically compiling the program to work on all possible inputs. Indeed, the test inputs induce a natural collection of the original program's EMI variants, which can help differentially test any compiler and specifically target the difficult-to-find miscompilations. To create a practical implementation of EMI for validating C compilers, we profile a program's test executions and stochastically prune its unexecuted code. Our extensive testing in eleven months has led to 147 confirmed, unique bug reports for GCC and LLVM alone. The majority of those bugs are miscompilations, and more than 100 have already been fixed. Beyond testing compilers, EMI can be adapted to validate program transformation and analysis systems in general. This work opens up this exciting, new direction.
Article
In software testing, something which can verify the correctness of test case execution results is called an oracle. The oracle problem occurs when either an oracle does not exist, or exists but is too expensive to be used. Metamorphic testing is a testing approach which uses metamorphic relations, properties of the software under test represented in the form of relations among inputs and outputs of multiple executions, to help verify the correctness of a program. This paper presents new empirical evidence to support this approach, which has been used to alleviate the oracle problem in various applications and to enhance several software analysis and testing techniques. It has been observed that identification of a sufficient number of appropriate metamorphic relations for testing, even by inexperienced testers, was possible with a very small amount of training. Furthermore, the cost-effectiveness of the approach could be enhanced through the use of more diverse metamorphic relations. The empirical studies presented in this paper clearly show that a small number of diverse metamorphic relations, even those identified in an ad hoc manner, had a similar fault-detection capability to a test oracle, and could thus effectively help alleviate the oracle problem.
Article
We show how model checking and symbolic execution can be used to generate test inputs to achieve structural coverage of code that manipulates complex data structures. We focus on obtaining branch-coverage during unit testing of some of the core methods of the red-black tree implementation in the Java TreeMap library, using the Java PathFinder model checker. Three different test generation techniques will be introduced and compared, namely, straight model checking of the code, model checking used in a black-box fashion to generate all inputs up to a fixed size, and lastly, model checking used during white-box test input generation. The main contribution of this work is to show how efficient white-box test input generation can be done for code manipulating complex data, taking into account complex method preconditions.
Article
Regression testing is a testing activity that is performed to provide condence that changes do not harm the existing behaviour of the software. Test suites tend to grow in size as software evolve, often making it too costly to execute entire test suites. A number of dierent approaches have been studied to maximise the value of the accrued test suite: minimisation, selection and prioritisation. Test suite minimisation seeks to eliminate redundant test cases in order to reduce the number of tests to run. Test case selection seeks to identify the test cases that are relevant to some set of recent changes. Test case prioritisation seeks to order test cases in such a way that early fault detection is maximised. This paper surveys each area of minimisation, selection and prioritisation technique and discusses open problems and potential directions for future research. required for regression testing in various ways. A number of dierent approaches have been studied to aid the regression testing process. The three major branches include test suite minimisation, test case selection and test case prioritisation. Test suite minimisation is a process that seeks to identify and then eliminate the obsolete or redundant test cases from the test suite. Test case selection deals with the problem of selecting a subset of test cases that will be used to test the changed parts of the software. Finally, test case prioritisation concerns the identication
Conference Paper
The correctness of mission-critical software is an important part of information security, but oracle problem and test data generation are constraints for some programs. Although metamorphic testing (MT) is practical for programs with oracle problem and evolutionary testing (ET) is a good application of genetic algorithm (GA) for automatic test data generation, fitness functions used in ET are not always effective at target search. This article provides a method for improving ET's efficiency by considering metamorphic relation (MR) when fitness function is constructed, and finally some conclusions are presented.
Conference Paper
Full software test automation requires automated test input generation, execution, and output evaluation. The latter task is non-trivial and usually referred to as the oracle problem in software testing. The present paper describes an empirical study on metamorphic testing, an approach to the oracle problem. This study was conducted with common Java implementations of determinant computation in order to evaluate the usefulness of the metamorphic testing approach and to establish general criteria that can be used to quickly assess metamorphic relations with respect to their suitability. The latter is very important, since metamorphic testing is based on so-called metamorphic relations on input-output tuples, which can easily be found. It is, however, crucial to evaluate these relations according to their usefulness. The empirical study enables us to derive general rules that can be used to quickly assess metamorphic relations and identify those that should be considered and studied in more detail with other methods (e.g. with mutation analysis)
Conference Paper
Symbolic execution is a powerful static program analysis technique that has been used for the automated generation of test inputs. Directed Automated Random Testing (DART) is a dynamic variant of symbolic execution that initially uses random values to execute a program and collects symbolic path conditions during the execution. These conditions are then used to produce new inputs to execute the program along different paths. It has been argued that DART can handle situations where classical static symbolic execution fails due to incompleteness in decision procedures and its inability to handle external library calls. We propose here a technique that mitigates these previous limitations of classical symbolic execution. The proposed technique splits the generated path conditions into (a) constraints that can be solved by a decision procedure and (b) complex non-linear constraints with uninterpreted functions to represent external library calls. The solutions generated from the decision procedure are used to simplify the complex constraints and the resulting path conditions are checked again for satisfiability. We also present heuristics that can further improve our technique. We show how our technique can enable classical symbolic execution to cover paths that other dynamic symbolic execution approaches cannot cover. Our method has been implemented within the Symbolic PathFinder tool and has been applied to several examples, including two from the NASA domain.
Conference Paper
We show how model checking and symbolic execution can be used to generate test inputs to achieve structural coverage of code that manipulates complex data structures. We focus on obtaining branch-coverage during unit testing of some of the core methods of the red-black tree implementation in the Java TreeMap library, using the Java PathFinder model checker. Three different test generation techniques will be introduced and compared, namely, straight model checking of the code, model checking used in a black-box fashion to generate all inputs up to a fixed size, and lastly, model checking used during white-box test input generation. The main contribution of this work is to show how efficient white-box test input generation can be done for code manipulating complex data, taking into account complex method preconditions.
Article
Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no "test oracle" to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.