Conference Paper

Empirical Studies of Test Case Prioritization in a JUnit Testing Environment

Dept. of Comput. Sci. & Eng., Nebraska Univ., Lincoln, NE, USA
DOI: 10.1109/ISSRE.2004.18 Conference: Software Reliability Engineering, 2004. ISSRE 2004. 15th International Symposium on
Source: DBLP


Test case prioritization provides a way to run test cases with the highest priority earliest. Numerous empirical studies have shown that prioritization can improve a test suite's rate of fault detection, but the extent to which these results generalize is an open question because the studies have all focused on a single procedural language, C, and a few specific types of test suites, in particular, Java and the JUnit testing framework are being used extensively in practice, and the effectiveness of prioritization techniques on Java systems tested under JUnit has not been investigated. We have therefore designed and performed a controlled experiment examining whether test case prioritization can be effective on Java programs tested under JUnit, and comparing the results to those achieved in earlier studies. Our analyses show that test case prioritization can significantly improve the rate of fault detection of JUnit test suites, but also reveal differences with respect to previous studies that can be related to the language and testing paradigm.

  • Source
    • "Table 2 shows the maximum, mean, and minimum sizes of these test suites. We followed existing work (Do et al. 2004; Elbaum et al. 2002; Mei et al. 2011) to exclude a faulty version from data analyses if more than 20 percent of the test cases detected the fault from the version. The numbers of faulty versions actually used are shown in the rightmost column of Table 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many existing studies measure the effectiveness of test case prioritization techniques using the average performance on a set of test suites. However, in each regression test session, a real-world developer may only afford to apply one prioritization technique to one test suite to test a service once, even if this application results in an adverse scenario such that the actual performance in this test session is far below the average result achievable by the same technique over the same test suite for the same application. It indicates that assessing the average performance of such a technique cannot provide adequate confidence for developers to apply the technique. We ask a couple of questions: To what extent does the effectiveness of prioritization techniques in average scenarios correlate with that in adverse scenarios? Moreover, to what extent may a design factor of this class of techniques affect the effectiveness of prioritization in different types of scenarios? To the best of our knowledge, we report in this paper the first controlled experiment to study these two new research questions through more than 300 million APFD and HMFD data points produced from 19 techniques, eight WS-BPEL benchmarks and 1000 test cases prioritized by each technique 1000 times. A main result reveals a strong and linear correlation between the effectiveness in the average scenarios and that in the adverse scenarios. Another interesting result is that many pairs of levels of the same design factors significantly change their relative strengths of being more effective within the same pairs in handling a wide spectrum of prioritized test suites produced by the same techniques over the same test suite in testing the same benchmarks, and the results obtained from the average scenarios is more similar to that of the more effective end than otherwise. This work provides the first piece of strong evidence for the research community to re-assess how they develop and validate their techniques in the average scenarios and beyond.
    Full-text · Article · Jul 2015 · International Journal of Web Services Research
  • Source
    • "Rothermel et al. [53] provided empirical evidence of the usefulness of the prioritization techniques with respect to the random ordering by measuring the ability to early detect faults. A similar analysis was performed by Do et al. [18] using the Java unit test framework (JUnit). Elbaum et al. [22] considered further testing criteria with different granularity, e.g., statement coverage or function coverage. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A way to reduce the cost of regression testing consists of selecting or prioritizing subsets of test cases from a test suite according to some criteria. Besides greedy algorithms, cost cognizant additional greedy algorithms, multi-objective optimization algorithms, and Multi-Objective Genetic Algorithms (MOGAs), have also been proposed to tackle this problem. However, previous studies have shown that there is no clear winner between greedy and MOGAs, and that their combination does not necessarily produce better results. In this paper we show that the optimality of MOGAs can be significantly improved by diversifying the solutions (sub-sets of the test suite) generated during the search process. Specifically, we introduce a new MOGA, coined as DIV-GA (DIversity based Genetic Algorithm), based on the mechanisms of orthogonal design and orthogonal evolution that increase diversity by injecting new orthogonal individuals during the search process. Results of an empirical study conducted on eleven programs show that DIV-GA outperforms both greedy algorithms and the traditional MOGAs from the optimality point of view. Moreover, the solutions (sub-sets of the test suite) provided by DIV-GA are able to detect more faults than the other algorithms, while keeping the same test execution cost.
    Full-text · Article · Oct 2014 · IEEE Transactions on Software Engineering
  • Source
    • "Srivastava and Thiagarajan [28] built an Echelon system to prioritize test cases according to the potential change impacts of individual test cases between versions of a program to cover maximally the affected programs. Most of the existing experiments are conducted on procedural and object-oriented programs [3]. In addition, studies on prioritizing test cases using input domain information [7][39] and service discovery mechanisms [40] have been explored. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In real life, a tester can only afford to apply one test case prioritization technique to one test suite against a service-oriented workflow application once in the regression testing of the application, even if it results in an adverse scenario such that the actual performance in the test session is far below average. It is unclear whether the factors of test case prioritization techniques known to be significant in terms of average performance can be extrapolated to adverse scenarios. In this paper, we examine whether such a factor or technique may consistently affect the rate of fault detection in both the average and adverse scenarios. The factors studied include prioritization strategy, artifacts to provide coverage data, ordering direction of a strategy, and the use of executable and non-executable artifacts. The results show that only a minor portion of the 10 studied techniques, most of which are based on the iterative strategy, are consistently effective in both average and adverse scenarios. To the best of our knowledge, this paper presents the first piece of empirical evidence regarding the consistency in the effectiveness of test case prioritization techniques and factors of service-oriented workflow applications between average and adverse scenarios.
    Full-text · Conference Paper · Jul 2014
Show more