Conference Paper

Infrastructure Support for Controlled Experimentation with Software Testing and Regression Testing Techniques

Dept. of Comput. Sci. & Eng., Nebraska Univ., Lincoln, NE, USA
DOI: 10.1109/ISESE.2004.1334894 Conference: Empirical Software Engineering, 2004. ISESE '04. Proceedings. 2004 International Symposium on
Source: IEEE Xplore


Where the creation, understanding, and assessment of software testing and regression testing techniques are concerned, controlled experimentation is an indispensable research methodology. Obtaining the infrastructure necessary to support such experimentation, however, is difficult and expensive. As a result, progress in experimentation with testing techniques has been slow, and empirical data on the costs and effectiveness of techniques remains relatively scarce. To help address this problem, we have been designing and constructing infrastructure to support controlled experimentation with testing and regression testing techniques. This paper reports on the challenges faced by researchers experimenting with testing techniques, including those that inform the design of our infrastructure. The paper then describes the infrastructure that we are creating in response to these challenges, and that we are now making available to other researchers, and discusses the impact that this infrastructure has and can be expected to have.

Full-text preview

Available from:
  • Source
    • "However, it offers no concrete help on how the various components of the experiment can be designed and what could be measured. Do et al. [13], [14] define SIR, the Software Artifact Infrastructure Repository (, to support controlled experiments with software testing techniques. The main contribution of their work is a set of benchmark programs that can be used to evaluate testing techniques. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There exists a real need in industry to have guidelines on what testing techniques use for different testing objectives, and how usable (effective, efficient, satisfactory) these techniques are. Up to date, these guidelines do not exist. Such guidelines could be obtained by doing secondary studies on a body of evidence consisting of case studies evaluating and comparing testing techniques and tools. However, such a body of evidence is also lacking. In this paper, we will make a first step towards creating such body of evidence by defining a general methodological evaluation framework that can simplify the design of case studies for comparing software testing tools, and make the results more precise, reliable, and easy to com-pare. Using this framework, (1) software testing practitioners can more easily define case studies through an instantiation of the framework, (2) results can be better compared since they are all executed according to a similar design, (3) the gap in existing work on methodological evaluation frameworks will be narrowed, and (4) a body of evidence will be initiated. By means of validating the framework, we will present successful applications of this methodological framework to various case studies for evaluating testing tools in an industrial environment with real objects and real subjects.
    Full-text · Conference Paper · Aug 2012
  • Source
    • "To enable the experiment to scale — it required months of computational time (explained in Section IV-C) — we randomly sampled and used 500 of Space's test cases. All were downloaded from the Subject Infrastructure Repository [9] "
    [Show abstract] [Hide abstract]
    ABSTRACT: To cluster executions that exhibit faulty behavior by the faults that cause them, researchers have proposed using internal execution events, such as statement profiles, to (1) measure execution similarities, (2) categorize executions based on those similarity results, and (3) suggest the resulting categories as sets of executions exhibiting uniform fault behavior. However, due to a paucity of evidence correlating profiles and output behavior, researchers employ multiple simplifying assumptions in order to justify such approaches. In this paper we present an empirical study of profile correlation with output behavior, and we reexamine the suitability of such simplifying assumptions. We examine over 4 billion test-case outputs and execution profiles from multiple programs with over 9000 versions. Our data provides evidence that with current techniques many executions should be omitted from the clustering analysis to provide clusters that each represent a single fault. In addition, our data reveals the previously undocumented effects of multiple faults on failures, which has implications for techniques' ability (and inability) to properly cluster. Our results suggest directions for the improvement of future failure-clustering techniques that better account for software-fault behavior.
    Preview · Article · Apr 2012
  • Source
    • "Another issue that we faced was the need to add mutants into our subjects. The nature of this study required a certain quantity of faults, and the SIR [8] did not provide a sufficient number for some of the subjects. Because by definition mutants are not real faults, we cannot generalize their results with absolute certainty. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple faults in a program can interact to form new behaviors in a program that would not be realized if the program were to contain the individual faults. This paper presents an in-depth study of the effects of the interaction of faults within a program. Many researchers attempt to ameliorate the effects of faulty programs. Unfortunately, such researchers are left to rely upon intuition about fault behavior due to the paucity of formalized studies of faults and their behavior. In an attempt to advance the understanding of faults and their behavior, we conducted a study of fault interaction across six subjects with more than 65,000 multiple-fault versions. The results of our study show four significant types of interaction, with one type - faults obscuring the effects of other faults - as the most prevalent type. The prevalence of obscuring faults' effects has an adverse effect on many automated software-engineering techniques, such as regression-testing, fault-localization, and fault-clustering techniques. Given that software commonly contains more than a single fault, these results have implications for developers and researchers alike by informing them of expected complications, which in many instances are opposite to intuition.
    Preview · Conference Paper · Sep 2011
Show more