Table 2 - uploaded by Richard A. Demillo
Content may be subject to copyright.

Source publication
Article
Full-text available
A new, empirically observed effect is introduced. Called ″the coupling effect,″ it may become a very important principle in practical testing activities. The idea is that programs appear to have the property - the ″coupling effect″ - that tests designed to detect simple kinds of errors are also effective in detecting much more complicated errors. T...

Context in source publication

Context 1
... will choose for our initial set of test data three vectors (Table 2). How sensitive is this data? ...

Similar publications

Article
Full-text available
In this paper we consider the application of a a new class of cumulative distribution function proposed by Ramos, Dey, Louzada and Lachos in [9] to the debugging theory. We study the Hausdorff approximation of the shifted Heaviside step function by this family. Numerical examples, illustrating our results are presented using programming environment...
Article
Full-text available
Random testing (RT) is a fundamental testing technique to assess software reliability, by simply selecting test cases in a random manner from the whole input domain. As an enhancement of RT, adaptive random testing (ART) has better failure-detection capability and has been widely applied in different scenarios, such as numerical programs, some obje...
Article
Full-text available
To effect change, the Software Sustainability Institute works with researchers, developers, funders, and infrastructure providers to identify and address key issues with research software.
Conference Paper
Full-text available
The method of choice of the software reliability models based on the analysis of assumptions and compatibility both input and output parameters is offered. This method is illustrated on the software reliability growth models (SRGM). The classification of SRGMs is carried out and its accordance to the known classifications is ascertained. The approa...
Article
Full-text available
In the late 80’s Blum, Luby, Rubinfeld, Kannan et al. pioneered the theory of self-testing as an alternative way of dealing with the problem of software reliability. Over the last decade this theory played a crucial role in the construction of probabilistically checkable proofs and the derivation of hardness of approximation results. Applications i...

Citations

... Measuring the quality of a test suite is often used to decide whether the test suite should be improved, and how much effort should be put into this endeavor. A popular measurement techniques is mutation analysis which follows this idea: if we inject artificial faults into the SUT, an existing test suite that can find those faults is probably good enough at discovering real faults [8]. The artificial faults are defined in the form of mutation operators which can perform small syntactic changes and are systematically applied on the SUT to produce a set of mutants (i.e., faulty programs). ...
... It has three Gate Types: the first two are defined for exchanging modelExecution-Commands (line 5) and OCL queries (line 6) provided by the common Package (imported in line 2), and the third is added in this paper to communicate events (line 7) provided by the xPSSMEvents Package (imported in line 3). There is also a Component Type comprising one gate instance for each Gate Type (lines[8][9][10][11][12]. Finally, a Test Configuration is defined containing two Component Instances, one of the Tester kind (line 17) and one of the SUT kind (lines[18][19][20][21].The SUT requires information about the model under test, including the path to the model (the MUTPath annotation in line 7) that should be set by the domain expert, and the name of the DSL that the model conforms to (the DSLName annotation in line 19) which is automatically set by the TDL Library Generator. ...
Article
Full-text available
Executable Domain-Specific Languages (xDSLs) allow the definition and the execution of behavioral models. Some behavioral models are reactive, meaning that during their execution, they accept external events and react by exposing events to the external environment. Since complex interaction may occur between the reactive model and the external environment, they should be tested as early as possible to ensure the correctness of their behavior. In this paper, we propose a set of generic testing facilities for reactive xDSLs using the standardized Test Description Language (TDL). Given a reactive xDSL, we generate a TDL library enabling the domain experts to write and run event-driven TDL test cases for conforming reactive models. To further support the domain expert, the approach integrates interactive debugging to help in localizing defects, and mutation analysis to measure the quality of test cases. We evaluate the level of genericity of the approach by successfully writing, executing, and analyzing 247 event-driven TDL test cases for 70 models conforming to two different reactive xDSLs.
... Section 5.3) to make them consistent and usable by Uppaal. Indeed, we assume the competent programmer hypothesis (DeMillo et al. 1978) (i.e., developers produce initial models close to being correct). Thus, we assume that developers make only small mistakes (e.g., we will only mutate the clock constants by one time unit, see below). ...
... Thus, we assume that developers make only small mistakes (e.g., we will only mutate the clock constants by one time unit, see below). These simple mistakes (simulated by small mutations) can be put in cascade or coupled to form other emergent faults using higher-order mutants, according to the coupling effect (DeMillo et al. 1978). ...
Article
Full-text available
Model-based mutation testing has the potential to effectively drive test generation to reveal faults in software systems. However, it faces a typical efficiency issue since it could produce many mutants that are equivalent to the original system model, making it impossible to generate test cases from them. We consider this problem when model-based mutation testing is applied to real-time system product lines, represented as timed automata. We define novel, time-specific mutation operators and formulate the equivalent mutant problem in the frame of timed refinement relations. Further, we study in which cases a mutation yields an equivalent mutant. Our theoretical results provide guidance to system engineers, allowing them to eliminate mutations from which no test case can be produced. Our empirical evaluation, based on a proof-of-concept implementation and a set of benchmarks from the literature, confirms the validity of our theory and demonstrates that in general our approach can avoid the generation of a significant amount of the equivalent mutants.
... We also evaluated how well JAttack can be used for automated compiler testing by extracting templates from existing Java projects. This evaluation is inspired by mutation testing [6,23,30,55,57,59,60], where we essentially "mutate" existing code to construct different tests for compilers. Note that, unlike in traditional mutation testing, holes in our case are filled by randomly generating values and expressions. ...
... Note that our approach for extracting templates and then using them to generate concrete programs is similar in nature to concepts in mutation testing [30], where existing programs are mutated into other similar programs through syntactic mutation operators [23,59,60]. Conceptually, converting a program into a template program and then generating additional programs through JAttack is like mutating the original program. ...
Preprint
We present JAttack, a framework that enables template-based testing for compilers. Using JAttack, a developer writes a template program that describes a set of programs to be generated and given as test inputs to a compiler. Such a framework enables developers to incorporate their domain knowledge on testing compilers, giving a basic program structure that allows for exploring complex programs that can trigger sophisticated compiler optimizations. A developer writes a template program in the host language (Java) that contains holes to be filled by JAttack. Each hole, written using a domain-specific language, constructs a node within an extended abstract syntax tree (eAST). An eAST node defines the search space for the hole, i.e., a set of expressions and values. JAttack generates programs by executing templates and filling each hole by randomly choosing expressions and values (available within the search space defined by the hole). Additionally, we introduce several optimizations to reduce JAttack's generation cost. While JAttack could be used to test various compiler features, we demonstrate its capabilities in helping test just-in-time (JIT) Java compilers, whose optimizations occur at runtime after a sufficient number of executions. Using JAttack, we have found six critical bugs that were confirmed by Oracle developers. Four of them were previously unknown, including two unknown CVEs (Common Vulnerabilities and Exposures). JAttack shows the power of combining developers' domain knowledge (via templates) with random testing to detect bugs in JIT compilers.
... Mutation Testing (MT) [5] is a proven technique in Software Engineering (SE); it is the de facto standard to compare different testing criteria [6,7] or to evaluate the quality of a test set [7]. MT's basic assumption is that if a program P and its mutated version M, obtained by introducing a small artificial change to P, differ on an input x (i.e., P(x) M(x)) then the mutant M is killed, that is a defect was detected. ...
Preprint
Context: Mutation Testing (MT) is an important tool in traditional Software Engineering (SE) white-box testing. It aims to artificially inject faults in a system to evaluate a test suite's capability to detect them, assuming that the test suite defects finding capability will then translate to real faults. If MT has long been used in SE, it is only recently that it started gaining the attention of the Deep Learning (DL) community, with researchers adapting it to improve the testability of DL models and improve the trustworthiness of DL systems. Objective: If several techniques have been proposed for MT, most of them neglected the stochasticity inherent to DL resulting from the training phase. Even the latest MT approaches in DL, which propose to tackle MT through a statistical approach, might give inconsistent results. Indeed, as their statistic is based on a fixed set of sampled training instances, it can lead to different results across instances set when results should be consistent for any instance. Methods: In this work, we propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem and allows for a more consistent decision on whether a mutant is killed or not. Results: We show that PMT effectively allows a more consistent and informed decision on mutations through evaluation using three models and eight mutation operators used in previously proposed MT methods. We also analyze the trade-off between the approximation error and the cost of our method, showing that relatively small error can be achieved for a manageable cost. Conclusion: Our results showed the limitation of current MT practices in DNN and the need to rethink them. We believe PMT is the first step in that direction which effectively removes the lack of consistency across test executions of previous methods caused by the stochasticity of DNN training.
... These are created by systematic injection of faults using some predefined mutation operators [7]. Mutation testing is thus a form of white-box testing initially suggested in Ref. [8] and later explored by different researchers [9][10][11]. Execution of a test case (test inputs) against mutants results in the adequacy score of that test case, where this result is also called the Mutation Score (MS). ...
... Test Data Target Path Fitness 1 (12,4), (8,27), (45,8), (9,44) 3, 1, 3, 1 0.5 2 (14,9), (23,8), (33,45), (14,5) 2, 2, 2, 3 0.5 3 (49,9), (7,33), (28,5), (39,8) 2, 1, 2, 2 0.5 4 (7,12), (6,18), (16,4), (9,42) 1, 4, 3, 1 0.75 5 (32,6), (44,16), (20,7), (17,12) 2, 2, 2, 2 0.25 ...
... Test Data Target Path Fitness 1 (12,4), (8,27), (45,8), (9,44) 3, 1, 3, 1 0.5 2 (14,9), (23,8), (33,45), (14,5) 2, 2, 2, 3 0.5 3 (49,9), (7,33), (28,5), (39,8) 2, 1, 2, 2 0.5 4 (7,12), (6,18), (16,4), (9,42) 1, 4, 3, 1 0.75 5 (32,6), (44,16), (20,7), (17,12) 2, 2, 2, 2 0.25 ...
Article
Full-text available
Information Technology has rapidly developed in recent years and software systems can play a critical role in the symmetry of the technology. Regarding the field of software testing, white-box unit-level testing constitutes the backbone of all other testing techniques, as testing can be entirely implemented by considering the source code of each System Under Test (SUT). In unit-level white-box testing, mutants can be used; these mutants are artificially generated faults seeded in each SUT that behave similarly to the realistic ones. Executing test cases against mutants results in the adequacy (mutation) score of each test case. Efficient Genetic Algorithm (GA)-based methods have been proposed to address different problems in white-box unit testing and, in particular, issues of mutation testing techniques. In this research paper, a new approach, which integrates the path coverage-based testing method with the novel idea of tracing a Fault Detection Matrix (FDM) to achieve maximum mutation coverage, is proposed. The proposed real coded GA for mutation testing is designed to achieve the highest Mutation Score, and it is thus named RGA-MS. The approach is implemented in two phases: path coverage-based test data are initially generated and stored in an optimized test suite. In the next phase, the test suite is executed to kill the mutants present in the SUT. The proposed method aims to achieve the minimum test dataset, having at the same time the highest Mutation Score by removing duplicate test data covering the same mutants. The proposed approach is implemented on the same SUTs as these have been used for path testing. We proved that the RGA-MS approach can cover maximum mutants with a minimum number of test cases. Furthermore, the proposed method can generate a maximum path coverage-based test suite with minimum test data generation compared to other algorithms. In addition, all mutants in the SUT can be covered by less number of test data with no duplicates. Ultimately, the generated optimal test suite is trained to achieve the highest Mutation Score. GA is used to find the maximum mutation coverage as well as to delete the redundant test cases.
... Since their introduction by DeMillo et al. [1] and Hamlet [2], mutation testing has been thoroughly studied in academia. Empirical studies show that mutation testing is more effective in finding faults compared with other white-box testing approaches [3]. ...
... These mutants are then run against the test cases. The quality of the test cases is determined by computing the mutation score, also called mutation adequacy [1]. ...
Article
Full-text available
Mutation testing is an effective, yet costly, testing approach, as it requires generating and running large numbers of faulty programs, called mutants. Mutation testing also suffers from a fundamental problem, which is having a large percentage of equivalent mutants. These are mutants that produce the same output as the original program, and therefore, cannot be detected. Higher-order mutation is a promising approach that can produce hard-to-detect faulty programs called subtle mutants, with a low percentage of equivalent mutants. Subtle higher-order mutants contribute a small set of the large space of mutants which grows even larger as the order of mutation becomes higher. In this paper, we developed a genetic algorithm for finding subtle higher-order mutants. The proposed approach uses a new mechanism in the crossover phase and uses five selection techniques to select mutants that go to the next generation in the genetic algorithm. We implemented a tool, called GaSubtle that automates the process of creating subtle mutants. We evaluated the proposed approach by using 10 subject programs. Our evaluation shows that the proposed crossover generates more subtle mutants than the technique used in a previous genetic algorithm with less execution time. Results vary on the selection strategies, suggesting a dependency relation with the tested code.
... Mutation analysis works by making simple syntactic changes to the program under test and then generating many different versions of it. Each program contains artificial faults and these versions are called mutants [42]. The transformation rules that define how to introduce syntactic changes to the program are called mutant operators [20]. ...
... In higher-order mutation testing, HOMs were mainly used to evaluate the quality of test suites [47], and to reduce testing costs [30], [48], [49]. Some researchers use HOMs in test data generation [50] and coupling effect analysis [42], [51]. Also, HOMs can be adopted to alleviate the equivalent mutant problem [52] and estimate mutation coverage of FOMs for reducing the cost on first-order mutation testing [53]. ...
Article
Abstract—First-order mutants (FOMs) have been widely used in mutation-based fault localization (MBFL) approaches and have achieved promising results in single-fault localization scenarios (SFL-scenario). Higher-order mutants (HOMs) are proposed to simulate complex faults and can be applied in MBFL theoretically for multiple-fault localization scenarios (MFL-scenario). However, whether HOMs can improve MBFL’s performance is not investigated and the effectiveness is not thoroughly evaluated. In this empirical study, we investigate the impact of HOMs on the performance of MBFL in SFL-scenario and MFL-scenario. The experiments on two real-world benchmarks reveal that 1) 2-HOMs can help improve the MBFL performance in SFL-scenarios; 2) in MFL-scenarios, both 2-HOMs and 3-HOMs can achieve better performance than FOMs; and 3) huge computational cost cannot be ignored in the practice of HOMs. Therefore, effective methods to reduce the number of HOMs for future MBFL studies should be considered
... In this paper, we propose an approach for validating both the presence and the absence of a set of real faults in the model of a hardware design, that is, the specification of the HDL program, using the well-known software testing methods Holistic testing [9,12] and Mutation testing [24,41,71]. The main motivation of the approach is to validate both the presence and absence of the faults. ...
... The proposed approach utilizes Holistic testing [9,12] and Mutation testing [24,41,71] to achieve the MBIT. Holistic, model-based testing introduces an integrated view encapsulating positive and negative testing. ...
... Mutation testing, introduced by DeMillo et al. [24] and Hamlet [41], is a fault-based testing technique, originally suggested for software. It can be used to assess the effectiveness of a given test set using a testing criterion, namely mutation score [47]. ...
Article
Full-text available
An ideal test is supposed to show not only the presence of bugs but also their absence. Based on the Fundamental Test Theory of Goodenough and Gerhart (IEEE Trans Softw Eng SE-1(2):156–173, 1975), this paper proposes an approach to model-based ideal testing of hardware description language (HDL) programs based on their behavioral model. Test sequences are generated from both original (fault-free) and mutant (faulty) models in the sense of positive and negative testing, forming a holistic test view. These test sequences are then executed on original (fault-free) and mutant (faulty) HDL programs, in the sense of mutation testing. Using the techniques known from automata theory, test selection criteria are developed and formally show that they fulfill the major requirements of Fundamental Test Theory, that is, reliability and validity. The current paper comprises a preparation step (consisting of the sub-steps model construction, model mutation, model conversion, and test generation) and a composition step (consisting of the sub-steps pre-selection and construction of Ideal test suites). All the steps are supported by a toolchain that is already implemented and is available online. To critically validate the proposed approach, three case studies (a sequence detector, a traffic light controller, and a RISC-V processor) are used and the strengths and weaknesses of the approach are discussed. The proposed approach achieves the highest mutation score in positive and negative testing for all case studies in comparison with two existing methods (regular expression-based test generation and context-based random test generation), using four different techniques.
... Mutation testing was¯rst introduced by Lipton. 1 This¯eld was later developed popularly by DeMillo et al. 2 Mutation testing is considered as a fault-based testing technique providing a test adequacy criterion. This criterion is utilized to evaluate the e®ectiveness of test set via its ability to uncover faults. ...
... The principle of mutation testing is to mimic faults committed by programmers when writing programs. 2 Each simple fault is injected into the original program to generate defective program, called mutant. For instance, a variable in a program is replaced by another variable of the same data type. ...
... The main objective of mutation testing is to assess the quality of test set through a test adequacy criterion by de¯ning a mutation score as follows 2 : ...
Article
Full-text available
Currently, there are many research studies that apply and improve mutation testing techniques including traditional mutation testing or first-order mutation testing, and higher-order mutation testing (HOMT) for evaluating the quality of the set of test data in particular, and the quality of test suites in general. The results of those studies have proven the effectiveness of mutation testing in the field of software testing. Mutation testing allows the quality of test cases to be automatically evaluated, thereby helping the testers to improve the quality in the design and execution of the software testing. Besides, these studies have also pointed out the main barriers in applying mutation testing techniques in practice. However, we are the first to introduce a method that can reduce the cost, but keep the quality of testing activity based on evaluating the quality of the mutation operator as well as the quality of the test cases. In this paper, we concentrate on two problems regarding higher-order mutation testing: Evaluating the quality of mutation operators as well as generated mutants and prioritizing test cases based upon its capability of killing mutants. This may help developers allocate suitably their resources during testing phase. The study of this paper is an extended version of our previous study titled “Evaluating Mutation Operator and Test Case Effectiveness by Means of Mutation Testing”, which is published in the proceedings of the 13th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2021 (V. N. Do, Q. V. Nguyen and T. B. Nguyen, Evaluating Mutation Operator and Test Case Effectiveness by Means of Mutation Testing, in Intelligent Information and Database Systems. ACIIDS 2021, eds. N. T. Nguyen, S. Chittayasothorn, D. Niyato and B. Trawiński. Lecture Notes in Computer Science, Vol. 12672 (Springer, Cham), https://doi.org/10.1007/978-3-030-73280-6_66) to confirm the usefulness of our proposed method.
... There are now several test data generation tools for languages, including C (Cadar et al., 2008;Lakhotia et al., 2013) and Java (Fraser & Arcuri, 2011). Popular test data generation techniques include symbolic execution of the code (Cadar & Sen, 2013), dynamic execution guided by a fitness function (Harman et al., 2015), and hybrids of these two techniques (Baars et al., 2011 In order to assess the effectiveness of the test suites generated, we use mutation testing, a topic also widely-studied since the 1970s (DeMillo et al., 1978). A mutant is a version of the program into which a fault is deliberately inserted, thereby assessing the test suite's fault detection ability (Jia & Harman, 2011;Papadakis et al., 2019). ...
Conference Paper
With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural languages where small inaccuracies may not change the meaning of a sentence. To address this issue, we propose to leverage an automated unit-testing system to filter out invalid translations, thereby creating a fully tested parallel corpus. We found that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated, comfortably outperforming the state-of-the-art for all language pairs studied. In particular, for Java→Python and Python→C++ we outperform the best previous methods by more than 16% and 24% respectively, reducing the error rate by more than 35%.