Conference Paper

Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For instance, Tuncali and Fainekos (2019) investigate similar scenarios that trigger different behaviours of autonomous vehicles in safety critical settings, e.g., nearly avoidable vehicle collisions. Oracle Overall, we found 13 papers in our pool that tackle the oracle problem for MLSs (Zheng et al. 2019;Xie et al. 2011;Nakajima and Bui 2016Qin et al. 2018;Cheng et al. 2018b;Ding et al. 2017;Gopinath et al. 2018;Murphy et al. 2007aSaha and Kanewala 2019;Xie et al. 2018). The challenge is to assess the correctness of MLSs' behaviour, which is possibly stochastic, due to the non-deterministic nature of training (e.g., because of the random initialisation of weights or the use of stochastic optimisers) and which depends on the choice of the training set. ...
... The challenge is to assess the correctness of MLSs' behaviour, which is possibly stochastic, due to the non-deterministic nature of training (e.g., because of the random initialisation of weights or the use of stochastic optimisers) and which depends on the choice of the training set. The vast majority of the proposed oracles leverages metamorphic relations among input data as a way to decide if the execution with new inputs is a pass or a fail, under the assumption that such new inputs share similarities with inputs having known labels (Xie et al. 2011;Cheng et al. 2018b;Ding et al. 2017;Saha and Kanewala 2019). ...
... This result indicates that ML models are mostly tested "in isolation", whereas it would be also important to investigate how failures of these components affect the behaviour of the whole MLS (i.e., whether model level faults propagate to the system level). i.e., in principle it may be applicable to any MLS (Aniculaesei et al. 2018;Byun et al. 2019;Cheng et al. 2018a, b;Du et al. 2019;Eniser et al. 2019;Guo et al. 2018;Henriksson et al. 2019;Kim et al. 2019;Li et al. 2018;Ma et al. 2018bMa et al. , c, d, 2019Murphy et al. 2007aMurphy et al. , b, 2008Murphy et al. , b, 2009Nakajima and Bui 2016Odena et al. 2019;Pei et al. 2017;Saha and Kanewala 2019;Sekhon and Fleming 2019;Shen et al. 2018;Shi et al. 2019;Sun et al. 2018a, b;Tian et al. 2018;Udeshi and Chattopadhyay 2019;Uesato et al. 2019;Xie et al. 2018Xie et al. , 2019Xie et al. , 2011Zhang et al. 2018aZhang et al. , 2019Zhao and Gao 2018). Around 30% proposed approaches are designed for autonomous systems (Abeysirigoonawardena et al. 2019;Beglerovic et al. 2017;Bühler and Wegener 2004;Klueck et al. 2018;Li et al. 2016;Mullins et al. 2018;de Oliveira Neves et al. 2016;Patel et al. 2018;Strickland et al. 2018;Wolschke et al. 2017;Fremont et al. 2019), among which self-driving cars Majumdar et al. 2019;Rubaiyat et al. 2018;Wolschke et al. 2018;Zhang et al. 2018b) or ADAS (Tuncali et al. 2018(Tuncali et al. , 2019Abdessalem et al. 2016Abdessalem et al. , 2018a. ...
Article
Full-text available
Context: A Machine Learning based System (MLS) is a software system including one or more components that learn how to perform a task from a given data set. The increasing adoption of MLSs in safety critical domains such as autonomous driving, healthcare, and finance has fostered much attention towards the quality assurance of such systems. Despite the advances in software testing, MLSs bring novel and unprecedented challenges, since their behaviour is defined jointly by the code that implements them and the data used for training them. Objective: To identify the existing solutions for functional testing of MLSs, and classify them from three different perspectives: (1) the context of the problem they address, (2) their features, and (3) their empirical evaluation. To report demographic information about the ongoing research. To identify open challenges for future research. Method: We conducted a systematic mapping study about testing techniques for MLSs driven by 33 research questions. We followed existing guidelines when defining our research protocol so as to increase the repeatability and reliability of our results. Results: We identified 70 relevant primary studies, mostly published in the last years. We identified 11 problems addressed in the literature. We investigated multiple aspects of the testing approaches, such as the used/proposed adequacy criteria, the algorithms for test input generation, and the test oracles. Conclusions: The most active research areas in MLS testing address automated scenario/input generation and test oracle creation. MLS testing is a rapidly growing and developing research area, with many open challenges, such as the generation of realistic inputs and the definition of reliable evaluation metrics and benchmarks.
... Metamorphic relations (MR) are the properties of the functionality of the system under test (SUT) [12]. The key role of MR is the generation of follow-up test cases as well as verification of test results in the absence of a test oracle [13]. In terms of precision, MR differs from other types of properties as it is the relationship among multiple executions of the SUT. ...
... In MT, testers define MRs which are used to generate new test cases (referred as followup test cases) from the available test cases (referred as original/source test cases) [44]. The key role of MR is to generate new test cases and to verify test results in the absence of a test oracle [13]. For verification of test results, there are only two possible outcomes: a high fault detection capability or a low fault detection capability. ...
Article
Full-text available
Magnetic resonance imaging (MRI) is an information-rich research tool used in diagnostics using image processing applications (IPAs), and the results are utilized in machine learning. Therefore, testing of IPAs for credible results is vital. A deficient IPA would cause the related taxonomies of the machine learning to be defective as well and diagnosis will not be perfect. Accurate disease detection by IPA, without surgical intervention, leads to improved quality of treatment. Current challenges for testing of IPA include an absence of a test oracle. One way to alleviate the test oracle problem is metamorphic testing which identifies the specific properties called metamorphic relations of the system under test. Previously metamorphic testing approaches have been applied and evaluated on IPAs, but there is no previous work on evaluation of metamorphic testing on MRI images. In this work, we have evaluated effectiveness of metamorphic testing on edge detection of MRI images. The aim of this study is to determine which metamorphic relations are more effective for metamorphic testing of edge detection in MRI images such as T1, T2 and flair images. Our results show that the fault detection rate of MR4 is highest and MR2 is the lowest among all type of MRI images at the threshold of 0.95.
... These properties are expressed in the form of relations among software inputs and outputs, formally known as metamorphic relations (MRs). Since its first publications in 1998 [4], [5], MT has been repeatedly found to be effective at alleviating the oracle problem in software testing across many different application domains and platforms, including biomedical applications [6], [7], [8], web services [9], [10], embedded systems [11], [12], component-based software [13], compilers [14], [15], [16], machine learning classifiers [17], [18], [19], [20], [21], online search engines [22], [23], [24], image processing [25], artificial intelligence (AI) systems [26], and autonomous car systems [27], [28], [29]. Although MT has demonstrated success in alleviating the oracle problem, its application consumes additional testing resources because it involves multiple program executions. ...
... It then follows from Definition 2 that MR 1 and MR 2 are composable with each other. Therefore, MR 12 and MR 21 were formed and tested in our study. ...
Article
Metamorphic Relations (MRs) play a key role in determining the fault detection capability of Metamorphic Testing (MT). As human judgement is required for MR identification, systematic MR generation has long been an important research area in MT. Additionally, due to the extra program executions required for follow-up test cases, some concerns have been raised about MT cost-effectiveness. Consequently, the reduction in testing costs associated with MT has become another important issue to be addressed. MR composition can address both of these problems. This technique can automatically generate new MRs by composing existing ones, thereby reducing the number of follow-up test cases. Despite this advantage, previous studies on MR composition have empirically shown that some composite MRs have lower fault detection capability than their corresponding component MRs. To investigate this issue, we performed theoretical and empirical analyses to identify what characteristics component MRs should possess so that their corresponding composite MR has at least the same fault detection capability as the component MRs do. We have also derived a convenient, but effective guideline so that the fault detection capability of MT will most likely not be reduced after composition.
... These properties are expressed in the form of relations among software inputs and outputs, formally known as metamorphic relations (MRs). Since its first publications in 1998 [4], [5], MT has been repeatedly found to be effective at alleviating the oracle problem in software testing across many different application domains and platforms, including biomedical applications [6], [7], [8], web services [9], [10], embedded systems [11], [12], component-based software [13], compilers [14], [15], [16], machine learning classifiers [17], [18], [19], [20], [21], online search engines [22], [23], [24], image processing [25], artificial intelligence (AI) systems [26], and autonomous car systems [27], [28], [29]. Although MT has demonstrated success in alleviating the oracle problem, its application consumes additional testing resources because it involves multiple program executions. ...
... We also noted that the applicability of our guideline varies across different application domains. More specifically, according to the results in 6, the guideline is largely applicable to compilers, numeric and scientific programs, and AI systems (e.g., image processing and autonomous car systems), and relatively less applicable to biomedical applications, web services, embedded systems, and online [17], [18], [19], [20], [21] 16 7 7 43.75 100.00 Online search engines [22], [23], [24] 9 0 0 0.00 0.00 Assorted computer science algorithms [33], [34], [35] 27 16 11 59.26 68.75 ...
... Zhang et al.'s [10] study successfully applied MT to validate systems implementing "class integration test order generation". There are many domains of applications and compilers have been applied MT to alleviate test oracle problems [21], such as bioinformatic software [22], machine learning classifiers [18,[23][24][25][26][27], cryptographic software [28], and online search engines [19]. ...
Article
Full-text available
In the realm of software defect prediction, unsupervised models often step in when labelled datasets are scarce, despite facing the challenge of validating models without prior knowledge of data. Addressing this, we proposed a novel approach leveraging generic metamorphic testing to validate such models effectively, bypassing the need for expert-derived metamorphic relations. Our method identifies five categories of generic metamorphic relations, further divided into twenty-one individual generic metamorphic relations, all formulated through generic Data Mutation Operators. This framework enables us to generate follow-up datasets from the source datasets, training respective software defect prediction models. By comparing predictions between the source and follow-up software defect prediction models and identifying inconsistencies, we can assess the model’s sensitivity to generic metamorphic relations as a form of validation. This approach was rigorously evaluated across twenty software defect prediction models, incorporating fourteen different machine learning algorithms and twenty high-dimensional public datasets. Remarkably, the robustness of our generic MT model was confirmed, showing substantial effectiveness in validating software defect prediction models, independent of whether they were supervised or unsupervised. Software defect prediction models, using Agglomerative clustering and Density-Based Spatial Clustering of Applications with Noise algorithms, did not violate any metamorphic relation, and nineteen software defect prediction models did not significantly violate the generic metamorphic relation "Shrinkage and Expansion". Our findings suggest that employing generic metamorphic relations, especially "Shrinkage and Expansion", can universally enhance the validation of defect prediction models. Furthermore, our model can be applied to continuously monitor software defect prediction models.
... The biggest challenge in metamorphic testing is the definition of the metamorphic relations suitable for the software under investigation. Other research also examined the use of metamorphic relations to ensure the soundness of machine-learning classifiers [19][20][21]. ...
Article
Full-text available
Organizations employ data mining to discover patterns in historic data in order to learn predictive models. Depending on the predictive model the predictions may be more or less accurate, raising the question about the reliability of individual predictions. This paper proposes a reference process aligned with the CRISP-DM to enable the assessment of reliability of individual predictions obtained from a predictive model. The reference process describes activities along the different stages of the development process required to establish a reliability assessment approach for a predictive model. The paper then presents in more detail two specific approaches for reliability assessment: perturbation of input cases and local quality measures. Furthermore, this paper describes elements of a knowledge graph to capture important metadata about the development process and training data. The knowledge graph serves to properly configure and employ the reliability assessment approaches.
... MA involves making small modifications to a program's source code to see if test cases can detect these changes [10]. MA injects defects into the SUT to evaluate the testing techniques [17]. One method inspired by MA is called Data Mutation [18]. ...
... Murphy et al., [29] proposed metamorphic testing to test different ML algorithms (e.g., Support Vector Machines). Saha and Kanewala compared multiple MRs from previous studies to test supervised classifiers [38]. Dwarakanath et al., proposed a set of MRs, which included rotation of images, to test image classifiers [13]. ...
... Metamorphic test effectively alleviates Oracle issues. The metamorphic test has been widely used in many fields, such as biological information [2], network Simulation [3], machine learning [4] and other fields. The metamorphic relations is the key to metamorphic tests, while the identification of metamorphic relations is a heavy and difficult job, and it is necessary to rely on testers to manually identify metamorphic relations for the knowledge of the software. ...
Article
Full-text available
The classification model of the metamorphic relations of the scientific computing program has four categories, of which the code model occupies an important position. We redefine a metamorphic relations, which is single-line metamorphic relations and it exists in the nuclear power program. And it is more comprehensive compared to other metamorphic relations to reflect the internal logic of the program, while node features and path features can maximize the logical structure of the programs. Based on the above situation, we started from the code layer to study the metamorphic relations, put the control flow chart as the bridge of the code and metamorphic relations, extract node features and path features control flowchart and transform the identification problem of metamorphic relations into the problem of predictive model classification. Apply the identified single-line metamorphic relations to the nuclear power field and detect the effectiveness of the software program. The results show that the method is used to enrich the study of the metamorphic relations with the code level, and further improve the degree of automation of metamorphic tests.
... Metamorphic testing has also been used in the domain of predictive analytics. There are multiple works which examine the use of metamorphic relations to ensure the soundness of machine-learning classifiers (Xie et al., 2011;Saha and Kanewala, 2019;Moreira et al., 2020). ...
... The effectiveness of metamorphic testing is dependent on the number of metamorphic relationships tested. Unique metamorphic relationships will typically catch different defects and have a different rate of defect detection [24]. This means metamorphic testing is generally more effective when conducted with a larger set of metamorphic relationships. ...
Article
Fourier-transform infrared spectroscopy (FTIR) is one of the commonly used techniques in chemical analysis. The chemical bonds that are present in samples absorb infrared light at various wavelengths based on the properties of chemical bonds between sets of atoms bonded together. By extracting these absorbance patterns, we aim to predict the presence or absence of various substructures within a compound based on its FTIR spectrum. Hypothetically, a powerful machine learning method with enough examples of a substructure should be able to identify the structure of an unknown compound by analysing its FTIR spectrum. To this extent we developed a novel system that trains neural networks to predict the presence of various substructures within a compound. We then propose to apply metamorphic testing to verify the network training process. Experimental results exhibit that metamorphic testing helps to develop a more effective training process for classifier neural networks.
... Finally, they claim that their MT-based approach can be used against any SUT that uses ML. This work is evaluated further in [15]. ...
Conference Paper
Modern-day demands for services often require an availability on a 24/7 basis as well as online accessibility around the globe. For this sake, personalized software systems, called chatbots, are applied. Chatbots offer services, goods or information in natural language. These programs respond to the user in real-time and offer an intuitive and simple interface to interact with. Advantages like these makes them increasingly popular. Chatbots can even act as substitutes for humans for specific purposes. Since the chatbot market is growing, chatbots might outperform and replace classical web applications in the future. For this reason, ensuring correct functionality of chatbots is of high and increasing importance. However, since different implementations and user behavior result in unpredictable results, the chatbot’s output is difficult to predict and classify as well. In fact, testing of chatbots represents a challenge because of the unavailability of a test oracle. In this paper, we introduce a metamorphic testing approach for chatbots. In general, metamorphic testing can be applied to situations where no expected values are available. In addition, we discuss how to obtain test cases for chatbots, i.e. sequences of interactions with a chatbot, in an according manner. We demonstrate our approach using a hotel booking system and discuss first experimental results.
... [37]. However, Saha and Kanewala [38] recently evaluated the effectiveness of MRs for testing supervised classifiers based on mutations of the system-under-test. They found that the detection rates for the used MRs of previous studies are limited when generating a large set of mutants. ...
Preprint
Metamorphic Testing is a software testing paradigm which aims at using necessary properties of a system-under-test, called metamorphic relations, to either check its expected outputs, or to generate new test cases. Metamorphic Testing has been successful to test programs for which a full oracle is not available or to test programs for which there are uncertainties on expected outputs such as learning systems. In this article, we propose Adaptive Metamorphic Testing as a generalization of a simple yet powerful reinforcement learning technique, namely contextual bandits, to select one of the multiple metamorphic relations available for a program. By using contextual bandits, Adaptive Metamorphic Testing learns which metamorphic relations are likely to transform a source test case, such that it has higher chance to discover faults. We present experimental results over two major case studies in machine learning, namely image classification and object detection, and identify weaknesses and robustness boundaries. Adaptive Metamorphic Testing efficiently identifies weaknesses of the tested systems in context of the source test case.
... To this aim, testing DNNs is a fervid and growing research area. Researchers have proposed methods for testing DNNs [17,24], testing of DNN-based autonomous vehicles [25,27], and fault localization for DNNs [6,20]. The Need for Explainable AI. ...
Conference Paper
The Artificial Intelligence (AI) revolution in software development is just around the corner. With the rise of AI, developers are expected to play a different role from the traditional role of programmers, as they will need to adapt their know-how and skillsets to complement and apply AI-based tools and techniques into their traditional web development workflow. In this extended abstract, some of the current trends on how AI is being leveraged to enhance web development and testing are discussed, along with some of the main opportunities and challenges for researchers.
Chapter
Plagiarism is a severe issue in academia, and uncertainty in plagiarism detection systems might lead to inconsistent detections. Thus, evaluating the system is essential; however, it is also a test oracle problem as it is challenging to distinguish correct behaviour from potentially incorrect behaviour of the system. To alleviate this challenge, we develop a feasible approach by applying an uncertainty matrix to identify the uncertainty of the plagiarism detection systems and derive metamorphic relations of metamorphic testing from the identified uncertainty for validation. We experimented with three plagiarism detection systems in a classroom scenario where students were hypothesized to use tools to generate answers for assignments. These answers were fed into the systems for validation by comparing the systems’ similarity scores of the tool-generated answers. Results showed that the proposed approach can effectively validate plagiarism detection systems. Future studies can apply this approach to locate uncertainties to enhance systems’ robustness.
Chapter
In software testing, there are two major problems that affect the fundamentals of testing, the oracle problem and the reliable test set problem. The oracle is the mechanism that checks the output correctness after running the program on the selected input data. The oracle absence creates the oracle problem. The oracle does not exist for non-testable programs, although it might be theoretically possible in some cases but too difficult in practice to determine the correct output. Metamorphic Testing is a method that has been proposed to reduce the intensity of the oracle problem. Metamorphic testing uses properties called metamorphic relations that are properties of the target function or program to test them automatically without human intervention.
Article
Metamorphic Testing is a software testing paradigm which aims at using necessary properties of a system under test, called metamorphic relations, to either check its expected outputs, or to generate new test cases. Metamorphic Testing has been successful to test programs for which a full oracle is not available or to test programs for which there are uncertainties on expected outputs such as learning systems. In this article, we propose Adaptive Metamorphic Testing as a generalization of a simple yet powerful reinforcement learning technique, namely contextual bandits, to select one of the multiple metamorphic relations available for a program. By using contextual bandits, Adaptive Metamorphic Testing learns which metamorphic relations are likely to transform a source test case, such that it has higher chance to discover faults. We present experimental results over two major case studies in machine learning, namely image classification and object detection, and identify weaknesses and robustness boundaries. Adaptive Metamorphic Testing efficiently identifies weaknesses of the tested systems in context of the source test case.
Conference Paper
Full-text available
We report on a novel use of metamorphic relations (MRs) in machine learning: instead of conducting metamorphic testing, we use MRs for the augmentation of the machine learning algorithms themselves. In particular, we report on how MRs can enable enhancements to an image classification problem of images containing hidden visual markers ("Artcodes"). Working on an original classifier, and using the characteristics of two different categories of images, two MRs, based on separation and occlusion, were used to improve the performance of the classifier. Our experimental results show that the MR-augmented classifier achieves better performance than the original classifier, algorithms, and extending the use of MRs beyond the context of software testing.
Conference Paper
Full-text available
Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing on these methods.
Preprint
Full-text available
We report on a novel use of metamorphic relations (MRs) in machine learning: instead of conducting metamorphic testing, we use MRs for the augmentation of the machine learning algorithms themselves. In particular, we report on how MRs can enable enhancements to an image classification problem of images containing hidden visual markers ("Artcodes"). Working on an original classifier, and using the characteristics of two different categories of images, two MRs, based on separation and occlusion, were used to improve the performance of the classifier. Our experimental results show that the MR-augmented classifier achieves better performance than the original classifier, algorithms, and extending the use of MRs beyond the context of software testing. KEYWORDS: Metamorphic testing, metamorphic relations, supervised classification, Artcodes, random forests
Article
Full-text available
Software testing is difficult to automate, especially in programs which have no oracle, or method of determining which output is correct. Metamorphic testing is a solution this problem. Metamorphic testing uses metamorphic relations to define test cases and expected outputs. A large amount of time is needed for a domain expert to determine which metamorphic relations can be used to test a given program. Metamorphic relation prediction removes this need for such an expert. We propose a method using semi-supervised machine learning to detect which metamorphic relations are applicable to a given code base. We compare this semi-supervised model with a supervised model, and show that the addition of unlabeled data improves the classification accuracy of the MR prediction model.
Article
Full-text available
Metamorphic testing is an approach to both test case generation and test result verification. A central element is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, and the validation and evaluation of software systems. In this article, we review the current research of metamorphic testing and discuss the challenges yet to be addressed. We also present visions for further improvement of metamorphic testing and highlight opportunities for new research.
Article
Full-text available
There are two fundamental limitations in software testing, known as the reliable test set problem and the oracle problem. Fault-based testing is an attempt by Morell to alleviate the reliable test set problem. In this paper, we propose to enhance fault-based testing to alleviate the oracle problem as well. We present an integrated method that combines metamorphic testing with fault-based testing using real and symbolic inputs.
Conference Paper
Full-text available
Mutation analysis seeds artificial faults (mutants) into a pro- gram and evaluates testing techniques by measuring how well they detect those mutants. Mutation analysis is well- established in software engineering research but hardly used in practice due to inherent scalability problems and the lack of proper tool support. In response to those challenges, this paper presents Major, a framework for mutation analysis and fault seeding. Major provides a compiler-integrated mu- tator and a mutation analyzer for JUnit tests. Major implements a large set of optimizations to enable efficient and scalable mutation analysis of large software sys- tems. It has already been applied to programs with more than 200,000 lines of code and 150,000 mutants. Moreover, Major features its own domain specific language and is de- signed to be highly configurable to support fundamental re- search in software engineering. Due to its efficiency and flexibility, the Major mutation framework is suitable for the application of mutation analysis in research and practice. It is publicly available at http://mutation-testing.org.
Conference Paper
Full-text available
In software testing, an oracle refers to a mechanism against which testers can decide whether or not outcomes of test case executions are correct. The oracle problem refers to situations when either an oracle is not available, or it is too expensive to apply. Metamorphic testing has emerged as an effective and efficient approach to alleviating the oracle problem. This article introduces the basic concepts and procedures of metamorphic testing, and gives examples to show its applications, and integration with other methods.
Article
Full-text available
The naive Bayes classifier greatly simplify learn-ing by assuming that features are independent given class. Although independence is generally a poor assumption, in practice naive Bayes often competes well with more sophisticated classifiers. Our broad goal is to understand the data character-istics which affect the performance of naive Bayes. Our approach uses Monte Carlo simulations that al-low a systematic study of classification accuracy for several classes of randomly generated prob-lems. We analyze the impact of the distribution entropy on the classification error, showing that low-entropy feature distributions yield good per-formance of naive Bayes. We also demonstrate that naive Bayes works well for certain nearly-functional feature dependencies, thus reaching its best performance in two opposite cases: completely independent features (as expected) and function-ally dependent features (which is surprising). An-other surprising result is that the accuracy of naive Bayes is not directly correlated with the degree of feature dependencies measured as the class-conditional mutual information between the fea-tures. Instead, a better predictor of naive Bayes ac-curacy is the amount of information about the class that is lost because of the independence assump-tion.
Conference Paper
Full-text available
It is challenging to test machine learning (ML) applica- tions, which are intended to learn properties of data sets where the correct answers are not already known. In the absence of a test oracle, one approach to testing these ap- plications is to use metamorphic testing, in which proper- ties of the application are exploited to define transforma- tion functions on the input, such that the new output will be unchanged or can easily be predicted based on the original output; if the output is not as expected, then a defect must exist in the application. Here, we seek to enumerate and classify the metamorphic properties of some machine learn- ing algorithms, and demonstrate how these can be applied to reveal defects in the applications of interest. In addition to the results of our testing, we present a set of properties that can be used to define these metamorphic relationships so that metamorphic testing can be used as a general ap- proach to testing machine learning applications.
Article
Full-text available
A frequently invoked assumption in program testing is that there is an oracle (i.e. the tester or an external mechanism can accurately decide whether or not the output produced by a program is correct). A program is non-testable if either an oracle does not exist or the tester must expend some extraordinary amount of time to determine whether or not the output is correct. The reasonableness of the oracle assumption is examined and the conclusion is reached that in many cases this is not a realistic assumption. The consequences of assuming the availability of an oracle are examined and alternatives investigated.
Conference Paper
Full-text available
The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.
Conference Paper
We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications.
Conference Paper
Software testing is difficult to automate, especially in programs which have no oracle, or method of determining which output is correct. Metamorphic testing is a solution this problem. Metamorphic testing uses metamorphic relations to define test cases and expected outputs. A large amount of time is needed for a domain expert to determine which metamorphic relations can be used to test a given program. Metamorphic relation prediction removes this need for such an expert. We propose a method using semi-supervised machine learning to detect which metamorphic relations are applicable to a given code base. We compare this semi-supervised model with a supervised model, and show that the addition of unlabeled data improves the classification accuracy of the MR prediction model.
Conference Paper
Matrices often represent important information in scientific applications and are involved in performing complex calculations. But systematically testing these applications is hard due to the oracle problem. Metamorphic testing is an effective approach to test such applications because it uses metamorphic relations to determine whether test cases have passed or failed. Metamorphic relations are typically identified with the help of a domain expert and is a labor intensive task. In this work we use a graph kernel based machine learning approach to predict metamorphic relations for matrix calculation programs. Previously, this graph kernel based machine learning approach was used to successfully predict metamorphic relations for programs that perform numerical calculations. Results of this study show that this approach can be used to predict metamorphic relations for matrix calculation programs as well.
Conference Paper
Geographic Information Systems (GIS) is a foundational application for different information systems, such as navigation system and global position system. However, due to the complexity of the system and algorithms, traditional testing methodologies confronted with the test oracle problem. Metamorphic testing (MT) can help resolve the problem by comparing metamorphic relations (MR) among multiple inputs and outputs, which have applied in many different domains. In this paper, we try to apply MT in GIS testing. We propose a semi-automated MT approach for GIS testing. To illustrate the effectiveness of the approach, we conduct a case study with a typical component of GIS: superficial area calculation program. In the empirical study, we construct six kinds of MRs with different properties and characters of the program or its algorithm. Our method could detect the target faults effectively without generating test oracles manually.
Conference Paper
When figuring out the expected output for each test case is difficult, metamorphic testing can be applied to alleviate such situations. An involved key challenge is to derive metamorphic relations for the program under test. This paper proposes a datamutation directed metamorphic relation acquisition methodology called μMT. Experimental results on three case studies show that μMT is feasible in deriving metamorphic relations for numeric applications and the derived metamorphic relations show reasonable fault detection effectiveness.
Article
Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behaviour of the programme. But the challenges in creating suitable oracles limit the ability to perform automated testing in some programmes, and especially in scientific software. Metamorphic testing is a method for automating the testing process for programmes without test oracles. This technique operates by checking whether the programme behaves according to properties called metamorphic relations. A metamorphic relation describes the change in output when the input is changed in a prescribed way. Unfortunately, finding the metamorphic relations satisfied by a programme or function remains a labour-intensive task, which is generally performed by a domain expert or a programmer. In this work, we propose a machine learning approach for predicting metamorphic relations that uses a graph-based representation of a programme to represent control flow and data dependency information. In earlier work, we found that simple features derived from such graphs provide good performance. An analysis of the features used in this earlier work led us to explore the effectiveness of several representations of those graphs using the machine learning framework of graph kernels, which provide various ways of measuring similarity between graphs. Our results show that a graph kernel that evaluates the contribution of all paths in the graph has the best accuracy and that control flow information is more useful than data dependency information. The data used in this study are available for download at http://www.cs.colostate.edu/saxs/MRpred/functions.tar.gz to help researchers in further development of metamorphic relation prediction methods.
Article
The test oracle problem is regarded as one of the most challenging problems in software testing. Metamorphic testing has been developed to alleviate this problem, which is done using the relations involving relevant inputs and their outputs. This keynote speech will provide a summary of the state-of-the-art of metamorphic testing.
Article
Several module and class testing techniques have been applied to object-oriented (OO) programs, but researchers have only recently begun developing test criteria that evaluate the use of key OO features such as inheritance, polymorphism, and encapsulation. Mutation testing is a powerful testing technique for generating software tests and evaluating the quality of software. However, the cost of mutation testing has traditionally been so high that it cannot be applied without full automated tool support. This paper presents a method to reduce the execution cost of mutation testing for OO programs by using two key technologies, mutant schemata generation (MSG) and bytecode translation. This method adapts the existing MSG method for mutants that change the program behaviour and uses bytecode translation for mutants that change the program structure. A key advantage is in performance: only two compilations are required and both the compilation and execution time for each is greatly reduced. A mutation tool based on the MSG/bytecode translation method has been built and used to measure the speedup over the separate compilation approach. Experimental results show that the MSG/bytecode translation method is about five times faster than separate compilation. Copyright © 2004 John Wiley & Sons, Ltd.
Conference Paper
Much software lacks test oracles, which limits automated testing. Metamorphic testing is one proposed method for automating the testing process for programs without test oracles. Unfortunately, finding the appropriate metamorphic relations required for use in metamorphic testing remains a labor intensive task, which is generally performed by a domain expert or a programmer. In this work we present a novel approach for automatically predicting metamorphic relations using machine learning techniques. Our approach uses a set of features developed using the control flow graph of a function for predicting likely metamorphic relations. We show the effectiveness of our method using a set of real world functions often used in scientific applications.
Conference Paper
Note: As originally published, the last three sentences on page 153, paragraph one of this article contain editorial mark-ups including underlines, strikethroughs and red ink that were not intended for final publication.This same paragraph should read: "Moreover, the MRs were identified by the testers, so such an identification process is subjective. In our study, we asked independent individuals to identify the MRs without a prior knowledge of the research question of the paper. This avoided us subconsciously identifying the MRs that favor our rationale."One fundamental challenge for software testing is the oracle problem, which means that either there does not exist a mechanism (called oracle) to verify the test output given any possible program input, or it is very expensive, if not impossible, to apply the oracle. Metamorphic testing is an innovative approach to oracle problem. In metamorphic testing, metamorphic relations are derived from the innate characteristics of the software under test. These relations can help to generate test data and verify the correctness of the test result without the need of oracle. The effectiveness of metamorphic relations can play a significant role in the testing process. It has been argued that the metamorphic relations that cause different software execution behaviors should have high fault detection ability. In this paper, we conduct a case study to analyze the relationship between the execution behavior and the fault-detection effectiveness of metamorphic relations. Some code coverage criteria are used to reflect the execution behavior. It is shown that there is a certain degree of correlation between the code coverage achieved by a metamorphic relation and its fault-detection effectiveness.
Conference Paper
The oracle problem is very common in the testing of service-oriented systems. Metamorphic testing has been proposed to alleviate the oracle problem in software testing. This talk aims at presenting the state of the art in metamorphic testing. It summaries what testing techniques have been successfully integrated with metamorphic testing and what application areas metamorphic testing have been applied to successfully alleviate the oracle problem. It will also introduce potential research projects using the metamorphic approach.
Article
Several module and class testing techniques have been applied to object-oriented programs, but researchers have only recently begun developing test criteria that evaluate the use of key OO features such as inheritance, polymorphism, and encapsulation. Mutation testing is a powerful testing technique for generating software tests and evaluating the quality of software. However, the cost of mutation testing has traditionally been so high it cannot be applied without full automated tool support. This paper presents a method to reduce the execution cost of mutation testing for OO programs by using two key technologies, Mutant Schemata Generation (MSG) and bytecode translation. This method adapts the existing MSG method for mutants that change the program behavior and uses bytecode translation for mutants that change the program structure. A key advantage is in performance: only two compilations are required and both the compilation and execution time for each is greatly reduced. A mutation tool based on the MSG/bytecode translation method has been built and used to measure the speedup over the separate compilation approach. Experimental results show that the MSG/bytecode translation method is about flve times faster than separate compilation.
Article
Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no "test oracle" to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.
Conference Paper
Machine learning techniques have long been used for various purposes in software engineering. This paper provides a brief overview of the state of the art and reports on a number of novel applications I was involved with in the area of software testing. Reflecting on this personal experience, I draw lessons learned and argue that more research should be performed in that direction as machine learning has the potential to significantly help in addressing some of the long-standing software testing problems.
Article
The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.