Conference Paper

Stability of Product-Line Samplingin Continuous Integration

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We used feature models from multiple sources that originate from different domains, namely finance, systems software, e-commerce, gaming, and communication [37,38,39,36,40] and encompass a wide range of number of features (9-1,408) and number of constraints (13-15,692) (cf. Table 1). ...
... For a wide range of feature model sizes, we selected small-and medium-sized feature models from examples provided by the tool FeatureIDE [36]. We also used feature models from real-world Kconfig systems, provided by Pett et al. [38], for which we chose the earliest and latest versions for each system. In addition, we used more complex, real-world feature models [39,40,37]. ...
... SampLNS, tries to find optimal t-wise samples in terms of sample size. However, the quality of sampling algorithms can also be assessed by other criteria [10], including sampling efficiency [63,26,34,25,44,58,11,45], effectiveness [56,16,64,34], and stability [38]. Regarding sampling efficiency, SampLNS allows to specify a time limit for optimizing a sample, similar to many evolutionary algorithms [46,51,65,52,66,53]. ...
Preprint
Full-text available
Modern software systems are typically configurable, a fundamental prerequisite for wide applicability and reusability. This flexibility poses an extraordinary challenge for quality assurance, as the enormous number of possible configurations makes it impractical to test each of them separately. This is where t-wise interaction sampling can be used to systematically cover the configuration space and detect unknown feature interactions. Over the last two decades, numerous algorithms for computing small interaction samples have been studied, providing improvements for a range of heuristic results; nevertheless, it has remained unclear how much these results can still be improved. We present a significant breakthrough: a fundamental framework, based on the mathematical principle of duality, for combining near-optimal solutions with provable lower bounds on the required sample size. This implies that we no longer need to work on heuristics with marginal or no improvement, but can certify the solution quality by establishing a limit on the remaining gap; in many cases, we can even prove optimality of achieved solutions. This theoretical contribution also provides extensive practical improvements: Our algorithm SampLNS was tested on 47 small and medium-sized configurable systems from the existing literature. SampLNS can reliably find samples of smaller size than previous methods in 85% of the cases; moreover, we can achieve and prove optimality of solutions for 63% of all instances. This makes it possible to avoid cumbersome efforts of minimizing samples by researchers as well as practitioners, and substantially save testing resources for most configurable systems.
... In this work, we provide insights on the scalability of modern off-the-shelf #SAT solvers for the analysis of feature models. Analyses based on feature-model counting can only be applied in practice if available #SAT solvers scale to industrial feature models considering time restrictions for typical use cases, such as interactive settings (Fritsch et al. 2020;Sprey et al. 2020;Krieter et al. 2017;Acher et al. 2013;Benavides et al. 2007) or continuous integration environments (Pett et al. 2021). We thus evaluate the runtimes of analyzing feature models with publicly available #SAT solvers. ...
... In addition, Knüppel et al provide an automotive product line Automotive02. Second, we evaluate the solvers on BusyBox provided by Pett et al (Pett et al. 2021). 10 Third, we include a feature model from the Finan-cialServices domain (Nieke et al. 2018;Pett et al. 2019). ...
... Still, this is the largest available benchmark and has been used by other authors (Krieter et al. 2018;Plazar et al. 2019;Baranov et al. 2020). Pett et al. (2021) translated the BusyBox model to CNF using KClause . 19 Then, the authors translated the CNF into feature model that is equivalent to the CNF and, thus, should maintain the variability. ...
Article
Full-text available
Product lines are widely used to manage families of products that share a common base of features. Typically, not every combination (configuration) of features is valid. Feature models are a de facto standard to specify valid configurations and allow standardized analyses on the variability of the underlying system. A large variety of such analyses depends on computing the number of valid configurations. To analyze feature models, they are typically translated to propositional logic. This allows to employ SAT solvers that compute the number of satisfying assignments of the propositional formula translated from a feature model. However, the SAT problem is generally assumed to be even harder than SAT and its scalability when applied to feature models has only been explored sparsely. Our main contribution is an investigation of the performance of off-the-shelf SAT solvers on computing the number of valid configurations for industrial feature models. We empirically evaluate 21 publicly available SAT solvers on 130 feature models from 15 subject systems. Our results indicate that current solvers master a majority of the evaluated systems (13/15) with the fastest solvers requiring less than one second for each successfully evaluated feature model. However, there are two complex systems for which none of the evaluated solvers scales. For the given experiment design, the solvers that consumed the least runtime are (2.5 seconds in sum for the 13 systems) and (3.5 seconds).
... Theorem 3.4). Thus, the Tseitin transformation can be safely used for all feature-model analyses discussed in Section 3.2 (e.g., as done elsewhere [13,88,91,112]). ...
... For the transformation, we consider three tools: FeatureIDE, Z3, and KConfigReader. Each tool implements some variant of a CNF transformation discussed in Section 4 and is used in practice for feature-model analysis [13,56,78,88,91,116]. For each of the three tools, we translate each feature-model formula into a suitable input format (i.e., XML, SMT-LIB 2 [7], or .model) ...
... As feature models with up to tens of thousands of features have been reported [10,25], repetitive counting queries without reusing information may require days of computation time [25]. While such runtimes can be remedied to some degree (e.g., by using parallelization), feature-model analyses are often applied in interactive applications [7,12,[14][15][16] or continuous-integration environments [32,33], mandating substantially shorter runtimes and limited resource usage. In particular, these requirements arose from collaborations with the automotive industry, where we integrated our prototype to reduce resource consumption. ...
Preprint
Full-text available
Feature models are commonly used to specify the valid configurations of a product line. In industry, feature models are often complex due to a large number of features and constraints. Thus, a multitude of automated analyses have been proposed. Many of those rely on computing the number of valid configurations which typically depends on solving a #SAT problem, a computationally expensive operation. Further, most counting-based analyses require numerous #SAT computations on the same feature model. In particular, many analyses depend on multiple computations for evaluating the number of valid configurations that include certain features or conform to partial configurations. Instead of using expensive repetitive computations on highly similar formulas, we aim to improve the performance by reusing knowledge between these computations. In this work, we are the first to propose reusing d-DNNFs for performing efficient repetitive queries on features and partial configurations. Our empirical evaluation shows that our approach is up-to 8,300 times faster (99.99\% CPU-time saved) than the state of the art of repetitively invoking #SAT solvers. Applying our tool ddnnife reduces runtimes from days to minutes compared to using #SAT solvers.
... The work of Pett et al. (2021) introduces a metric to measure the stability of sampling algorithms in the context of SPL regression testing. Considering that SPL products should be sampled for the regression testing during each CI cycle, it is desired to have similar products from one cycle to another. ...
Article
Full-text available
Highly-Configurable Software (HCSs) testing is usually costly, as a significant number of variants need to be tested. This becomes more problematic when Continuous Integration (CI) practices are adopted. CI leads the software to be integrated and tested multiple times a day, subject to time constraints (budgets). To address CI challenges, a learning-based test case prioritization approach named COLEMAN has been successfully applied. COLEMAN deals with test case volatility, in which some test cases can be included/removed over the CI cycles. Nevertheless, such an approach does not consider HCS particularities such as, by analogy, the volatility of variants. Given such a context, this work introduces two strategies for applying COLEMAN in the CI of HCS: the Variant Test Set Strategy (VTS) that relies on the test set specific for each variant; and the Whole Test Set Strategy (WST) that prioritizes the test set composed by the union of the test cases of all variants. Both strategies are applied to two real-world HCSs, considering three test budgets. Independently of the time budget, the proposed strategies using COLEMAN have the best performance in comparison with solutions generated randomly and by another learning approach from the literature. Moreover, COLEMAN produces, in more than 92% of the cases, reasonable solutions that are near to the optimal solutions obtained by a deterministic approach. Both strategies spend less than one second to execute. WTS provides better results in the less restrictive budgets, and VTS the opposite. WTS seems to better mitigate the problem of beginning without knowledge, and is more suitable when a new variant to be tested is added.
... Product-line evolution is a well-acknowledged field and current field of research [8][9][10][11]. Similar to all other software systems, evolution of product lines is ubiquitous due to changed or new requirements. In the process of product-line evolution, domain engineers may change the feature model, artifacts, and the feature-artifact mapping [12]. ...
Article
Full-text available
A product line is an approach for systematically managing configuration options of customizable systems, usually by means of features. Products are generated for configurations consisting of selected features. Product-line evolution can lead to unintended changes to product behavior. We illustrate that updating configurations after product-line evolution requires decisions of both, domain engineers responsible for product-line evolution as well as application engineers responsible for configurations. The challenge is that domain and application engineers might not be able to interact with each other. We propose a formal foundation and a methodology that enables domain engineers to guide application engineers through configuration evolution by sharing knowledge on product-line evolution and by defining automatic update operations for configurations. As an effect, we enable knowledge transfer between those engineers without the need for interactions. We evaluate our methodology on four large-scale industrial product lines. The results of the qualitative evaluation indicate that our method is flexible enough for real-world product-line evolution. The quantitative evaluation indicates that we detect product behavior changes for up to 55.3%55.3\% 55.3 % of the configurations which would not have been detected using existing methods.
Article
Modern software systems are typically configurable, a fundamental prerequisite for wide applicability and reusability. This flexibility poses an extraordinary challenge for quality assurance, as the enormous number of possible configurations makes it impractical to test each of them separately. This is where t-wise interaction sampling can be used to systematically cover the configuration space and detect unknown feature interactions. Over the last two decades, numerous algorithms for computing small interaction samples have been studied, providing improvements for a range of heuristic results; nevertheless, it has remained unclear how much these results can still be improved. We present a significant breakthrough: a fundamental framework, based on the mathematical principle of duality , for combining near-optimal solutions with provable lower bounds on the required sample size. This implies that we no longer need to work on heuristics with marginal or no improvement, but can certify the solution quality by establishing a limit on the remaining gap; in many cases, we can even prove optimality of achieved solutions. This theoretical contribution also provides extensive practical improvements: Our algorithm SampLNS was tested on 47 small and medium-sized configurable systems from the existing literature. SampLNS can reliably find samples of smaller size than previous methods in 85 % of the cases; moreover, we can achieve and prove optimality of solutions for 63 % of all instances. This makes it possible to avoid cumbersome efforts of minimizing samples by researchers as well as practitioners, and substantially save testing resources for most configurable systems.
Article
Feature models are commonly used to specify valid configurations of a product line. In industry, feature models are often complex due to numerous features and constraints. Thus, a multitude of automated analyses have been proposed. Many of those rely on computing the number of valid configurations, which typically depends on solving a #SAT problem, a computationally expensive operation. Even worse, most counting-based analyses require evaluation for multiple features or partial configurations resulting in numerous #SAT computations on the same feature model. Instead of repetitive computations on highly similar formulas, we aim to improve the performance by reusing knowledge between these computations. In this work, we are the first to propose reusing d-DNNFs for performing repetitive counting queries on features and partial configurations. In our experiments, reusing d-DNNFs saved up-to \sim 99.98% compared to repetitive invocations of #SAT solvers even when including compilation times. Overall, our tool ddnnife combined with the d-DNNF compiler d4 appears to be the most promising option when dealing with many repetitive feature-model counting queries.
Article
Full-text available
Feature models are commonly used to specify the valid configurations of product lines. As industrial feature models are typically complex, researchers and practitioners employ various automated analyses to study the configuration spaces. Many of these automated analyses require that numerous complex computations are executed on the same feature model, for example by querying a SAT or #SATsolver. With knowledge compilation, feature models can be compiled in a one-time effort to a target language that enables polynomial-time queries for otherwise more complex problems. In this work, we elaborate on the potential of employing knowledge compilation on feature models. First, we gather various feature-model analyses and study their computational complexity with regard to the underlying computational problem and the number of solver queries required for the respective analysis. Second, we collect knowledge-compilation target languages and map feature-model analyses to the languages that make the analysis tractable. Third, we empirically evaluate publicly available knowledge compilers to further inspect the potential benefits of knowledge-compilation target languages.
Conference Paper
Full-text available
Analyses of Software Product Lines (SPLs) rely on automated solvers to navigate complex dependencies among features and find legal configurations. Often these analyses do not support numerical features with constraints because propositional formulas use only Boolean variables. Some automated solvers can represent numerical features natively, but are limited in their ability to count and Uniform Random Sample (URS) configurations, which are key operations to derive unbiased statistics on configuration spaces. Bit-blasting is a technique to encode numerical constraints as propositional formulas. We use bit-blasting to encode Boolean and numerical constraints so that we can exploit existing #SAT solvers to count and URS configurations. Compared to state-of-art Satisfiability Modulo Theory and Constraint Programming solvers, our approach has two advantages: 1) faster and more scalable configuration counting and 2) reliable URS of SPL configurations. We also show that our work can be used to extend prior SAT-based SPL analyses to support numerical features and constraints.
Article
Full-text available
Continuous integration, at its core, includes a set of practices that aim to prevent and reduce the cost of software integration issues by merging working software copies often. Regression testing is considered a good practice in software development with continuous integration, which ensures that code changes are not negatively affecting software functionality. As, nowadays, software development is carried out iteratively, with small code increments continuously developed and regression tested, it is of critical importance that continuous regression testing is time efficient. However, in practice, regression testing is often long lasting and faces scalability problems as software grows larger or as software changes are made more frequently. One contributing factor to these issues is test redundancy, which causes the same software functionality being tested multiple times across a test suite. In large‐scale software, especially highly configurable software, redundancy in continuous regression testing can significantly grow the size of test suites and negatively affect the cost effectiveness of continuous integration. This paper presents a practical learning algorithm for optimizing continuous integration testing by reducing ineffective test redundancy in regression suites. The novelty of the algorithm lies in learning and predicting the fault‐detection effectiveness of continuous integration tests using historical test records and combining this information with coverage‐based redundancy metrics. The goal is to identify ineffective redundancy, which is maximally reduced in the resulting regression test suite, thus reducing test time and improving the performance of continuous integration. We apply and evaluate the algorithm in two industrial projects of continuous integration. The results show that the proposed algorithm can improve the efficiency of continuous integration practice in terms of decreasing test execution time by 38% on average compared to the industry practice of our case study and by 40% on average compared to the retest‐all approach. The results further demonstrate no significant reduction in fault‐detection effectiveness of continuous regression testing. This suggests that the proposed algorithm contributes to the state of the practice in the continuous integration development and testing of highly configurable systems.
Conference Paper
Full-text available
The analysis of software product lines is challenging due to the potentially large number of products, which grow exponentially in terms of the number of features. Product sampling is a technique used to avoid exhaustive testing, which is often infeasible. In this paper, we propose a classification for product sampling techniques and classify the existing literature accordingly. We distinguish the important characteristics of such approaches based on the information used for sampling, the kind of algorithm, and the achieved coverage criteria. Furthermore, we give an overview on existing tools and evaluations of product sampling techniques. We share our insights on the state-of-the-art of product sampling and discuss potential future work.
Article
Full-text available
Variability-sensitive verification pursues effective analysis of the exponentially many variants of a program family. Several variability-aware techniques have been proposed, but researchers still lack examples of concrete bugs induced by variability, occurring in real large-scale systems. A collection of real world bugs is needed to evaluate tool implementations of variability-sensitive analyses by testing them on real bugs. We present a qualitative study of 98 diverse variability bugs (i.e., bugs that occur in some variants and not in others) collected from bug-fixing commits in the Linux, Apache, BusyBox, and Marlin repositories. We analyze each of the bugs, and record the results in a database. For each bug, we create a self-contained simplified version and a simplified patch, in order to help researchers who are not experts on these subject studies to understand them, so that they can use these bugs for evaluation of their tools. In addition, we provide single-function versions of the bugs, which are useful for evaluating intra-procedural analyses. A web-based user interface for the database allows to conveniently browse and visualize the collection of bugs. Our study provides insights into the nature and occurrence of variability bugs in four highly-configurable systems implemented in C/C++, and shows in what ways variability hinders comprehension and the uncovering of software bugs.
Article
Full-text available
A software product line comprises a family of software products that share a common set of features. Testing an entire product-line product-by-product is infeasible due to the potentially exponential number of products in the number of features. Accordingly, several sampling approaches have been proposed to select a presumably minimal, yet sufficient number of products to be tested. Since the time budget for testing is limited or even a priori unknown, the order in which products are tested is crucial for effective product-line testing. Prioritizing products is required to increase the probability of detecting faults faster. In this article, we propose similarity-based prioritization, which can be efficiently applied on product samples. In our approach, we incrementally select the most diverse product in terms of features to be tested next in order to increase feature interaction coverage as fast as possible during product-by-product testing. We evaluate the gain in the effectiveness of similarity-based prioritization on three product lines with real faults. Furthermore, we compare similarity-based prioritization to random orders, an interaction-based approach, and the default orders produced by existing sampling algorithms considering feature models of various sizes. The results show that our approach potentially increases effectiveness in terms of fault detection ratio concerning faults within real-world product-line implementations as well as synthetically seeded faults. Moreover, we show that the default orders of recent sampling algorithms already show promising results, which, however, can still be improved in many cases using similarity-based prioritization.
Article
Full-text available
Features define common and variable parts of the members of a (software) product line. Feature models are used to specify the set of all valid feature combinations. Feature models not only enjoy an intuitive tree-like graphical syntax, but also a precise formal semantics, which can be denoted as propositional formulae over Boolean feature variables. A product line usually constitutes a long-term investment and, therefore, has to undergo continuous evolution to meet ever-changing requirements. First of all, product-line evolution leads to changes of the feature model due to its central role in the product-line paradigm. As a result, product-line engineers are often faced with the problems that (1) feature models are changed in an ad-hoc manner without proper documentation, and (2) the semantic impact of feature diagram changes is unclear. In this article, we propose a comprehensive approach to tackle both challenges. For (1), our approach compares the old and new version of the diagram representation of a feature model and specifies the changes using complex edit operations on feature diagrams. In this way, feature model changes are automatically detected and formally documented. For (2), we propose an approach for reasoning about the semantic impact of diagram changes. We present a set of edit operations on feature diagrams, where complex operations are primarily derived from evolution scenarios observed in a real-world case study, i.e., a product line from the automation engineering domain. We evaluated our approach to demonstrate its applicability with respect to the case study, as well as its scalability concerning experimental data sets.
Article
Full-text available
Software-product-line engineering has gained considerable momentum in recent years, both in industry and in academia. A software product line is a family of software products that share a common set of features. Software product lines challenge traditional analysis techniques, such as type checking, model checking, and theorem proving, in their quest of ensuring correctness and reliability of software. Simply creating and analyzing all products of a product line is usually not feasible, due to the potentially exponential number of valid feature combinations. Recently, researchers began to develop analysis techniques that take the distinguishing properties of software product lines into account, for example, by checking feature-related code in isolation or by exploiting variability information during analysis. The emerging field of product-line analyses is both broad and diverse, so it is difficult for researchers and practitioners to understand their similarities and differences. We propose a classification of product-line analyses to enable systematic research and application. Based on our insights with classifying and comparing a corpus of 123 research articles, we develop a research agenda to guide future research on product-line analyses.
Conference Paper
Full-text available
The advent of variability management and generator technology enables users to derive individual variants from a variable code base based on a selection of desired configuration options. This approach gives rise to the generation of possibly billions of variants that, however, cannot be efficiently analyzed for errors with classic analysis techniques. To address this issue, researchers and practitioners usually apply sampling heuristics. While sampling reduces the analysis effort significantly, the information obtained is necessarily incomplete and it is unknown whether sampling heuristics scale to billions of variants. Recently, researchers have begun to develop variability-aware analyses that analyze the variable code base directly exploiting the similarities among individual variants to reduce analysis effort. However, while being promising, so far, variability-aware analyses have been applied mostly only to small academic systems. To learn about the mutual strengths and weaknesses of variability-aware and sampling-based analyses of software systems, we compared the two strategies by means of two concrete analysis implementations (type checking and liveness analysis), applied them to three subject systems: Busybox, the x86 Linux kernel, and OpenSSL. Our key finding is that variability-aware analysis outperforms most sampling heuristics with respect to analysis time while preserving completeness.
Conference Paper
Full-text available
Software Product Line (SPL) testing is challenging due to the potentially huge number of derivable products. To alleviate this problem, numerous contributions have been proposed to reduce the number of products to be tested while still having a good coverage. However, not much attention has been paid to the order in which the products are tested. Test case prioritization techniques reorder test cases to meet a certain performance goal. For instance, testers may wish to order their test cases in order to detect faults as soon as possible, which would translate in faster feedback and earlier fault correction. In this paper, we explore the applicability of test case prioritization techniques to SPL testing. We propose five different prioritization criteria based on common metrics of feature models and we compare their effectiveness in increasing the rate of early fault detection, i.e. a measure of how quickly faults are detected. The results show that different orderings of the same SPL suite may lead to significant differences in the rate of early fault detection. They also show that our approach may contribute to accelerate the detection of faults of SPL test suites based on combinatorial testing.
Article
Full-text available
Software product line (SPL) testing consists of two separate but closely related test engineering activities: domain testing and application testing. Various software product line testing approaches have been developed over the last decade, and surveys have been conducted on them. However, thus far none of them deeply addressed the questions of what researches have been conducted in order to overcome the challenges posed by the two separate testing activities and their relationships. Thus, this paper surveys the current software product line testing approaches by defining a reference SPL testing processes and identifying, based on them, key research perspectives that are important in SPL testing. Through this survey, we identify the researches that addressed the challenges and also derive open research opportunities from each perspective.
Conference Paper
Full-text available
Product-line analysis has received considerable attention in the last decade. As it is often infeasible to analyze each prod-uct of a product line individually, researchers have developed analyses, called variability-aware analyses, that consider and exploit variability manifested in a code base. Variability-aware analyses are often significantly more efficient than traditional analyses, but each of them has certain weak-nesses regarding applicability or scalability. We present the Product-Line-Analysis model, a formal model for the classi-fication and comparison of existing analyses, including tradi-tional and variability-aware analyses, and lay a foundation for formulating and exploring further, combined analyses. As a proof of concept, we discuss different examples of anal-yses in the light of our model, and demonstrate its benefits for systematic comparison and exploration of product-line analyses.
Article
Full-text available
Software Product Lines (SPLs) are families of products whose commonalities and variability can be captured by Feature Models (FMs). T-wise testing aims at finding errors triggered by all interactions amongst t features, thus reducing drastically the number of products to test. T-wise testing approaches for SPLs are limited to small values of t -- which miss faulty interactions -- or limited by the size of the FM. Furthermore, they neither prioritize the products to test nor provide means to finely control the generation process. This paper offers (a) a search-based approach capable of generating products for large SPLs, forming a scalable and flexible alternative to current techniques and (b) prioritization algorithms for any set of products. Experiments conducted on 124 FMs (including large FMs such as the Linux kernel) demonstrate the feasibility and the practicality of our approach.
Conference Paper
Full-text available
Combinatorial interaction testing (CIT) is a method to sample con- figurations of a software system systematically for testing. Many algorithms have been developed that create CIT samples, however few have considered the practical concerns that arise when adding constraints between combinations of options. In this paper, we survey constraint handling techniques in existing algorithms and discuss the challenges that they present. We examine two highly- configurable software systems to quantify the nature of constraints in real systems. We then present a general constraint representa- tion and solving technique that can be integrated with existing CIT algorithms and compare two constraint-enhanced algorithm imple- mentations with existing CIT tools to demonstrate feasibility.
Conference Paper
Full-text available
Features express the variabilities and commonalities among programs in a software product line (SPL). A feature model defines the valid combinations of features, where each combination corresponds to a program in an SPL. SPLs and their feature models evolve over time. We classify the evolution of a feature model via modifications as refactorings, specializations, generalizations, or arbitrary edits. We present an algorithm to reason about feature model edits to help designers determine how the program membership of an SPL has changed. Our algorithm takes two feature models as input (before and after edit versions), where the set of features in both models are not necessarily the same, and it automatically computes the change classification. Our algorithm is able to give examples of added or deleted products and efficiently classifies edits to even large models that have thousands of features.
Article
Software Product Lines (SPLs) are a common technique to capture families of software products in terms of commonalities and variabilities. On a conceptual level, functionality of an SPL is modeled in terms of features in Feature Models (FMs). As other software systems, SPLs and their FMs are subject to evolution that may lead to the introduction of anomalies (e.g., non-selectable features). To fix such anomalies, developers need to understand the cause for them. However, for large evolution histories and large SPLs, explanations may become very long and, as a consequence, hard to understand. In this paper, we present a method for anomaly detection and explanation that, by encoding the entire evolution history, identifies the evolution step of anomaly introduction and explains which of the performed evolution operations lead to it. In our evaluation, we show that our method significantly reduces the complexity of generated explanations.
Article
Testing is a crucial activity of product-line engineering. Due to shared commonality, testing each variant individually results in redundant testing processes. By adopting regression testing strategies, variants are tested incrementally by focusing on the variability between variants to reduce the overall testing effort. However, product lines evolve during their life-cycle to adapt, e.g., to changing requirements. Hence, quality assurance has also to be ensured after product-line evolution by efficiently testing respective versions of variants. In this paper, we propose retest test selection for product-line regression testing of variants and versions of variants. Based on delta-oriented test modeling, we capture the commonality and variability of an evolving product line by means of differences between variants and versions of variants. We exploit those differences to apply change impact analyses, where we reason about changed dependencies to be retested when stepping from a variant or a version of a variant to its subsequent one by selecting test cases for reexecution. We prototypically implemented our approach and evaluated its effectiveness and efficiency by means of two evolving product lines showing positive results.
Article
A software product line comprises a family of software products that share a common set of features. It enables customers to compose software systems from a managed set of features. Testing every product of a product line individually is often infeasible due to the exponential number of possible products in the number of features. Several approaches have been proposed to restrict the number of products to be tested by sampling a subset of products achieving sufficient combinatorial interaction coverage. However, existing sampling algorithms do not scale well to large product lines, as they require a considerable amount of time to generate the samples. Moreover, samples are not available until a sampling algorithm completely terminates. As testing time is usually limited, we propose an incremental approach of product sampling for pairwise interaction testing (called IncLing), which enables developers to generate samples on demand in a step-wise manner. Furthermore, IncLing uses heuristics to efficiently achieve pairwise interaction coverage with a reasonable number of products. We evaluated IncLing by comparing it against existing sampling algorithms using feature models of different sizes. The results of our approach indicate efficiency improvements for product-line testing.
Conference Paper
Most software systems are designed to provide custom functionality using configuration options. Testing such systems is challenging as running tests of a single configuration is often not sufficient, because defects may appear in other configurations. Ideally, all configurations of a software system should be tested, which is usually not applicable in practice due to the combinatorial explosion with respect to the configuration options. Multiple sampling strategies aim to reduce the set of tested configurations to a feasible amount, such as T-wise sampling, random configurations, and user-defined configurations. However, these strategies are often not applied in practice as they require manual effort or a specialized testing framework. Within our tool FeatureIDE, we integrate all aforementioned strategies and reduce the manual effort by automating the process of generating and testing configurations. Furthermore, we provide support for unit testing to avoid redundant test executions and for variability-aware testing. With this extension of FeatureIDE, we aim to make recent testing techniques for configurable systems applicable in practice.
Conference Paper
Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.
Conference Paper
Software product lines have potential to allow for mass customization of products. Unfortunately, the resulting, vast amount of possible product variants with commonalities and differences leads to new challenges in software testing. Ideally, every product variant should be tested, especially in safety-critical systems. However, due to the exponentially increasing number of product variants, testing every product variant is not feasible. Thus, new concepts and techniques are required to provide efficient SPL testing strategies exploiting the commonalities of software artifacts between product variants to reduce redundancy in testing. In this paper, we present an efficient integration testing approach for SPLs based on delta modeling. We focus on test case prioritization. As a result, only the most important test cases for every product variant are tested, reducing the number of executed test cases significantly, as testing can stop at any given point because of resource constraints while ensuring that the most important test cases have been covered. We present the general concept and our evaluation results. The results show a measurable reduction of executed test cases compared to single-software testing approaches.
Article
Continuous integration has been around for a while now, but the habits it suggests are far from common practice. Automated builds, a thorough test suite, and committing to the mainline branch every day sound simple at first, but they require a responsible team to implement and constant care. What starts with improved tooling can be a catalyst for long-lasting change in your company's shipping culture. Continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value. The Web extra at http://youtu.be/tDl_cHfrJZo is an audio podcast of the Tools of the Trade column discusses how continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value.
Article
Context Testing plays an important role in the quality assurance process for software product line engineering. There are many opportunities for economies of scope and scale in the testing activities, but techniques that can take advantage of these opportunities are still needed. Objective The objective of this study is to identify testing strategies that have the potential to achieve these economies, and to provide a synthesis of available research on SPL testing strategies, to be applied towards reaching higher defect detection rates and reduced quality assurance effort. Method We performed a literature review of two hundred seventy-six studies published from the year 1998 up to the 1st1st semester of 2013. We used several filters to focus the review on the most relevant studies and we give detailed analyses of the core set of studies. Results The analysis of the reported strategies comprised two fundamental aspects for software product line testing: the selection of products for testing, and the actual test of products. Our findings indicate that the literature offers a large number of techniques to cope with such aspects. However, there is a lack of reports on realistic industrial experiences, which limits the inferences that can be drawn. Conclusion This study showed a number of leveraged strategies that can support both the selection of products, and the actual testing of products. Future research should also benefit from the problems and advantages identified in this study.
Conference Paper
Product-line technology is increasingly used in mission-critical and safety-critical applications. Hence, researchers are developing verification approaches that follow different strategies to cope with the specific properties of product lines. While the research community is discussing the mutual strengths and weaknesses of the different strategies - mostly at a conceptual level - there is a lack of evidence in terms of case studies, tool implementations, and experiments. We have collected and prepared six product lines as subject systems for experimentation. Furthermore, we have developed a model-checking tool chain for C-based and Java-based product lines, called SPLverifier, which we use to compare sample-based and family-based strategies with regard to verification performance and the ability to find defects. Based on the experimental results and an analytical model, we revisit the discussion of the strengths and weaknesses of product-line-verification strategies.
Article
Researchers have explored the application of combinatorial interaction testing (CIT) methods to construct samples to drive systematic testing of software system configurations. Applying CIT to highly-configurable software systems is complicated by the fact that, in many such systems, there are constraints between specific configuration parameters that render certain combinations invalid. Many CIT algorithms lack a mechanism to avoid these. In recent work, automated constraint solving methods have been combined with search-based CIT construction methods to address the constraint problem with promising results. However, these techniques can incur a non-trivial overhead. In this paper, we build upon our previous work to develop a family of greedy CIT sample generation algorithms that exploit calculations made by modern Boolean satisfiability (SAT) solvers to prune the search space of the CIT problem. We perform a comparative evaluation of the cost-effectiveness of these algorithms on four real-world highly-configurable software systems and on a population of synthetic examples that share the characteristics of those systems. In combination our techniques reduce the cost of CIT in the presence of constraints to 30 percent of the cost of widely-used unconstrained CIT methods without sacrificing the quality of the solutions.
Article
A scalable approach for software product line testing is required due to the size and complexity of industrial product lines. In this paper, we present a specialized algorithm (called ICPL) for generating covering arrays from feature models. ICPL makes it possible to apply combinatorial interaction testing to software product lines of the size and complexity found in industry. For example, ICPL allows pair-wise testing to be readily applied to projects of about 7,000 features and 200,000 constraints, the Linux Kernel, one of the largest product lines where the feature model is available. ICPL is compared to three of the leading algorithms for t-wise covering array generation. Based on a corpus of 19 feature models, data was collected for each algorithm and feature model when the algorithm could finish 100 runs within three days. These data are used for comparing the four algorithms. In addition to supporting large feature models, ICPL is quick, produces small covering arrays and, even though it is non-deterministic, produces a covering array of a similar size within approximately the same time each time it is run with the same feature model.
Article
Cited By (since 1996): 21, Export Date: 24 January 2013, Source: Scopus
Conference Paper
Feature models and associated feature diagrams allow modeling and visualizing the constraints leading to the valid products of a product line. In terms of their expressiveness, feature diagrams are equivalent to propositional formulas which makes them theoretically expensive to process and analyze. For example, satisfying propositional formulas, which translates into finding a valid product for a given feature model, is an NP-hard problem, which has no fast, optimal solution. This theoretical complexity could prevent the use of powerful analysis techniques to assist in the development and testing of product lines. However, we have found that satisfying realistic feature models is quick. Thus, we show that combinatorial interaction testing of product lines is feasible in practice. Based on this, we investigate covering array generation time and results for realistic feature models and find where the algorithms can be improved.
Conference Paper
A continuous integration system is often considered one of the key elements involved in supporting an agile software development and testing environment. As a traditional software tester transitioning to an agile development environment it became clear to me that I would need to put this essential infrastructure in place and promote improved development practices in order to make the transition to agile testing possible. This experience report discusses a continuous integration implementation I led last year. The initial motivations for implementing continuous integration are discussed and a pre and post-assessment using Martin Fowler's" practices of continuous integration" is provided along with the technical specifics of the implementation. The report concludes with a retrospective of my experiences implementing and promoting continuous integration within the context of agile testing.
Article
Combinatorial Testing (CT) can detect failures triggered by interactions of parameters in the Software Under Test (SUT) with a covering array test suite generated by some sampling mechanisms. It has been an active field of research in the last twenty years. This article aims to review previous work on CT, highlights the evolution of CT, and identifies important issues, methods, and applications of CT, with the goal of supporting and directing future practice and research in this area. First, we present the basic concepts and notations of CT. Second, we classify the research on CT into the following categories: modeling for CT, test suite generation, constraints, failure diagnosis, prioritization, metric, evaluation, testing procedure and the application of CT. For each of the categories, we survey the motivation, key issues, solutions, and the current state of research. Then, we review the contribution from different research groups, and present the growing trend of CT research. Finally, we recommend directions for future CT research, including: (1) modeling for CT, (2) improving the existing test suite generation algorithm, (3) improving analysis of testing result, (4) exploring the application of CT to different levels of testing and additional types of systems, (5) conducting more empirical studies to fully understand limitations and strengths of CT, and (6) combining CT with other testing techniques.
Continuous integration. Martin Fowler and Matthew Foemmel
  • Martin Fowler
  • Matthew Foemmel
Vasek Chvatal. 1979. A greedy heuristic for the set-covering problem
  • Vasek Chvatal
  • Chvatal Vasek