Conference Paper

t -wise coverage by uniform sampling: [challenge solution]

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Efficiently testing large configuration spaces of Software Product Lines (SPLs) needs a sampling algorithm that is both scalable and provides good t-wise coverage. The 2019 SPLC Sampling Challenge provides large real-world feature models and asks for a t-wise sampling algorithm that can work for those models. We evaluated t-wise coverage by uniform sampling (US) the configurations of one of the provided feature models. US means that every (legal) configuration is equally likely to be selected. US yields statistically representative samples of a configuration space and can be used as a baseline to compare other sampling algorithms. We used existing algorithm called Smarch to uniformly sample SPL configurations. While uniform sampling alone was not enough to produce 100% 1-wise and 2-wise coverage, we used standard probabilistic analysis to explain our experimental results and to conjecture how uniform sampling may enhance the scalability of existing t-wise sampling algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Thus, minimizing the sample size can reduce the overall testing effort [11]. However, to ensure reliability, it is typically necessary to test each interaction between up to t features [12,13]. This yields the t-wise Interaction Sampling Problem (t-ISP): find a sample consisting of a minimum number of configurations, such that every valid combination of t or less features is part of at least one configuration, where t is an arbitrary but fixed positive integer. ...
... As our problem specification requires 100 % pairwise coverage, we have chosen only algorithms that are guaranteed to generate such samples. This excludes, for instance, algorithms that generate random samples [12] and most algorithms that employ local or population-based search [46,51,52,53]. ...
... SampLNS considers t-wise interaction coverage, but there exist many other coverage criteria for which samples can be generated [10], such as partial t-wise coverage [16,67,68,69], coverage of the solution space [57,70,48,71], coverage of test cases [72,73,74], uniform sampling [75,76,77,78,12], distance-based sampling [79,46], and coverage of feature model mutations [80,81,65]. As some of these coverage criteria are correlated to t-wise coverage, it is reasonable to assume that LNS may be employed to reduce the size of corresponding samples as well. ...
Preprint
Full-text available
Modern software systems are typically configurable, a fundamental prerequisite for wide applicability and reusability. This flexibility poses an extraordinary challenge for quality assurance, as the enormous number of possible configurations makes it impractical to test each of them separately. This is where t-wise interaction sampling can be used to systematically cover the configuration space and detect unknown feature interactions. Over the last two decades, numerous algorithms for computing small interaction samples have been studied, providing improvements for a range of heuristic results; nevertheless, it has remained unclear how much these results can still be improved. We present a significant breakthrough: a fundamental framework, based on the mathematical principle of duality, for combining near-optimal solutions with provable lower bounds on the required sample size. This implies that we no longer need to work on heuristics with marginal or no improvement, but can certify the solution quality by establishing a limit on the remaining gap; in many cases, we can even prove optimality of achieved solutions. This theoretical contribution also provides extensive practical improvements: Our algorithm SampLNS was tested on 47 small and medium-sized configurable systems from the existing literature. SampLNS can reliably find samples of smaller size than previous methods in 85% of the cases; moreover, we can achieve and prove optimality of solutions for 63% of all instances. This makes it possible to avoid cumbersome efforts of minimizing samples by researchers as well as practitioners, and substantially save testing resources for most configurable systems.
... Uniform random sampling is a technique for sampling random solutions from propositional formulas [3]- [5]. Uniform random sampling can be applied to a product line by translating the configuration space to a propositional formula in conjunctive normal form (CNF). ...
... Uniform random sampling is an active field of research with various sampling approaches. Available samplers can be categorized into heuristic- [15], [16], hashing- [17], [18] and counting-based [5], [19]- [21] samplers [21]. ...
... UniGen2 [17] is a random hashing-based sampler that supports parallelization and it's runtime was improved in UniGen3 [18]. KUS [19], SPUR [20], SMARCH [5] and BDDSampler [21] are counting-based samplers. Knuth's algorithm [22] using Binary Decision Diagrams (BDDs) was applied to SPLs by Oh et al. in [4]. ...
Preprint
Full-text available
A software product line models the variability of highly configurable systems. Complete exploration of all valid configurations (the configuration space) is infeasible as it grows exponentially with the number of features in the worst case. In practice, few representative configurations are sampled instead, which may be used for software testing or hardware verification. Pseudo-randomness of modern computers introduces statistical bias into these samples. Quantum computing enables truly random, uniform configuration sampling based on inherently random quantum physical effects. We propose a method to encode the entire configuration space in a superposition and then measure one random sample. We show the method's uniformity over multiple samples and investigate its scale for different feature models. We discuss the possibilities and limitations of quantum computing for uniform random sampling regarding current and future quantum hardware.
... For example, with = 2, the heuristic would guarantee that every possible pair of optimization options appears in at least one sample. These t-wise sampling approaches have also been researched in the field of combinatorial testing, for instance, by Oh et al. [33]. Moreover, tools from this field, such as ACTS [45], can generate these covering arrays also for configuration space learning purposes. ...
... However, due to the unconstrained nature of this particular case and the relatively small configuration space (compared to colossal spaces described by Acher et al. [1]), our focus lies on coverage-based approaches. In particular, we are interested inwise sampling approaches, like provided by Oh et al. [33]. Within the scope of this work, we used the ACTS tool, as presented by Yu et al. [45], to compute 2-and 3-way coverage of the optimization space. ...
... Existing approaches use different sampling techniques -like greedy techniques [4], [7]- [10], local search techniques [11]- [13], population-based techniques [14]- [16], manual selection techniques [17], or feature interaction and coverage based techniques [18]- [21]. However, most of these approaches are top-down approaches and are limited by their reliance on hand-crafted heuristics, which may not be able to capture the full complexity of the configuration space. ...
... Configuration sampling. Existing approaches use different sampling techniques -like greedy techniques [4], [7]- [10], local search techniques [11]- [13], population-based techniques [14]- [16], manual selection techniques [17], or feature interaction and coverage based techniques [18]- [21]. However, most of these approaches are top-down approaches and are limited by their reliance on hand-crafted heuristics, which may not be able to capture the full complexity of the configuration space. ...
... Further, as both tools perform similar to other included tools, we expect no substantial changes in the conclusions once the bug is fixed. We consider the inclusion of the solvers for comparison as important as both tools have been employed in the product-line domain [25,52,63,69,73]. Therefore, we include sharpSAT and dSharp, despite the bug, for the following research questions. ...
Preprint
Full-text available
Feature models are commonly used to specify the valid configurations of a product line. In industry, feature models are often complex due to a large number of features and constraints. Thus, a multitude of automated analyses have been proposed. Many of those rely on computing the number of valid configurations which typically depends on solving a #SAT problem, a computationally expensive operation. Further, most counting-based analyses require numerous #SAT computations on the same feature model. In particular, many analyses depend on multiple computations for evaluating the number of valid configurations that include certain features or conform to partial configurations. Instead of using expensive repetitive computations on highly similar formulas, we aim to improve the performance by reusing knowledge between these computations. In this work, we are the first to propose reusing d-DNNFs for performing efficient repetitive queries on features and partial configurations. Our empirical evaluation shows that our approach is up-to 8,300 times faster (99.99\% CPU-time saved) than the state of the art of repetitively invoking #SAT solvers. Applying our tool ddnnife reduces runtimes from days to minutes compared to using #SAT solvers.
... That is, a set of features defines a unique configuration (or product) of an SPL. Clearly, as the number of features increases, the number of all possible configurations grows exponentially Oh et al. (2019). Due to constraints among features, not all configurations are valid. ...
Article
Full-text available
Sampling a small, valid and representative set of configurations from software product lines (SPLs) is important, yet challenging due to a huge number of possible configurations to be explored. Recently, the sampling strategy based on satisfiability (SAT) solving has enjoyed great popularity due to its high efficiency and good scalability. However, this sampling offers no guarantees on diversity, especially in terms of the number of selected features, an important property to characterize a configuration. In this paper, we propose a probability-aware diversification (PaD) strategy to cooperate with SAT solving in generating diverse configurations, with the effect that valid configurations are efficiently generated by SAT solving while also maintaining diversity brought by PaD. Experimental results on 51 public SPLs show that, when working cooperatively with PaD, the performance (regarding diversity) of off-the-shelf SAT solvers has substantial improvements, with large effect sizes observed on more than 71% of all the cases. Furthermore, we propose a general search-based framework where PaD and evolutionary algorithms can work together, and instantiate this framework in the context of search-based diverse sampling and search-based multi-objective SPL configuration (where there is a practical need of generating diverse configurations). It is demonstrated by the experimental results that PaD also brings abundant performance gains to these search-based approaches. Finally, we apply PaD to a practical problem, i.e., machine learning based performance predictions of SPLs, and show that using PaD tends to improve the accuracy of performance prediction models.
... To generate configurations, we used the random generator of FeatureIDE [35] (version 3.3) which does not generate uniformly distributed configurations. However, tools and methods to uniformly generate distributed configurations do not (yet) scale for large variability models as used in the evaluation [40]. Additionally, real-world configurations are not uniformly distributed and it is not possible to make statements about distribution without domain knowledge. ...
Article
Full-text available
A product line is an approach for systematically managing configuration options of customizable systems, usually by means of features. Products are generated for configurations consisting of selected features. Product-line evolution can lead to unintended changes to product behavior. We illustrate that updating configurations after product-line evolution requires decisions of both, domain engineers responsible for product-line evolution as well as application engineers responsible for configurations. The challenge is that domain and application engineers might not be able to interact with each other. We propose a formal foundation and a methodology that enables domain engineers to guide application engineers through configuration evolution by sharing knowledge on product-line evolution and by defining automatic update operations for configurations. As an effect, we enable knowledge transfer between those engineers without the need for interactions. We evaluate our methodology on four large-scale industrial product lines. The results of the qualitative evaluation indicate that our method is flexible enough for real-world product-line evolution. The quantitative evaluation indicates that we detect product behavior changes for up to 55.3%55.3\% 55.3 % of the configurations which would not have been detected using existing methods.
Article
To meet the increasing demand for customized software, highly configurable systems become essential in practice. Such systems offer many options to configure, and ensuring the reliability of these systems is critical. A widely-used evaluation metric for testing these systems is t -wise coverage, where t represents testing strength, and its value typically ranges from 2 to 6. It is crucial to design effective and efficient methods for generating test suites that achieve high t -wise coverage. However, current state-of-the-art methods need to generate large test suites for achieving high t -wise coverage. In this work, we propose a novel method called LS-Sampling-Plus that can efficiently generate test suites with high t -wise coverage for 2t62\leq t\leq 6 while being smaller in size compared to existing state-of-the-art methods. LS-Sampling-Plus incorporates many core algorithmic techniques, including two novel scoring functions, a dynamic mechanism for updating sampling probabilities, and a validity-guaranteed systematic search method. Our experiments on various practical benchmarks show that LS-Sampling-Plus can achieve higher t -wise coverage than current state-of-the-art methods, through building a test suite of the same size. Moreover, our evaluations indicate the effectiveness of all core algorithmic techniques of LS-Sampling-Plus . Further, LS-Sampling-Plus exhibits better scalability and fault detection capability than existing state-of-the-art methods.
Article
The Linux kernel is highly-configurable, with a build system that takes a configuration file as input and automatically tailors the source code accordingly. Configurability, however, complicates testing, because different configuration options lead to the inclusion of different code fragments. With thousands of patches received per month, Linux kernel maintainers employ extensive automated continuous integration testing. To attempt patch coverage, i.e., taking all changed lines into account, current approaches either use configuration files that maximize total statement coverage or use multiple randomly-generated configuration files, both of which incur high build times without guaranteeing patch coverage. To achieve patch coverage without exploding build times, we propose krepair, which automatically repairs configuration files that are fast-building but have poor patch coverage to achieve high patch coverage with little effect on build times. krepair works by discovering a small set of changes to a configuration file that will ensure patch coverage, preserving most of the original configuration file's settings. Our evaluation shows that, when applied to configuration files with poor patch coverage on a statistically-significant sample of recent Linux kernel patches, krepair achieves nearly complete patch coverage, 98.5% on average, while changing less than 1.53% of the original default configuration file in 99% of patches, which keeps build times 10.5x faster than maximal configuration files.
Article
Owing to the pervasiveness of software in our modern lives, software systems have evolved to be highly configurable. Combinatorial testing has emerged as a dominant paradigm for testing highly configurable systems. Often constraints are employed to define the environments where a given system is expected to work. Therefore, there has been a sustained interest in designing constraint-based test suite generation techniques. A significant goal of test suite generation techniques is to achieve t -wise coverage for higher values of t . Therefore, designing scalable techniques that can estimate t -wise coverage for a given set of tests and/or the estimation of maximum achievable t -wise coverage under a given set of constraints is of crucial importance. The existing estimation techniques face significant scalability hurdles. We designed scalable algorithms with mathematical guarantees to estimate (i) t -wise coverage for a given set of tests, and (ii) maximum t -wise coverage for a given set of constraints. In particular, ApproxCov takes in a test set U\mathcal{U} and returns an estimate of the t -wise coverage of U\mathcal{U} that is guaranteed to be within (1 ± ε)-factor of the ground truth with probability at least 1δ1 - \delta for a given tolerance parameter ε and a confidence parameter δ\delta . A scalable framework ApproxMaxCov for a given formula F outputs an approximation which is guaranteed to be within (1 ± ε) factor of the maximum achievable t -wise coverage under F, with probability 1δ\ge 1 - \delta for a given tolerance parameter ε and a confidence parameter δ\delta . Our comprehensive evaluation demonstrates that ApproxCov and ApproxMaxCov can handle benchmarks that are beyond the reach of current state-of-the-art approaches. In this paper we present proofs of correctness of ApproxCov, ApproxMaxCov, and of their generalizations.We show how the algorithms can improve the scalability of a test suite generator while maintaining its effectiveness. In addition, we compare several test suite generators on different feature combination sizes t .</p
Article
Full-text available
t-wise coverage is one of the most important techniques used to test configurations of software for finding bugs. It ensures that interactions between features of a Software Product Line (SPL) are tested. The size of SPLs (of thousands of features) makes the problem of finding a good test suite computationally expensive, as the number of t-wise combinations grows exponentially. In this article, we leverage Constraint Programming’s search strategies to generate test suites with a high coverage of configurations. We analyse the behaviour of the default random search strategy, and then we propose an improvement based on the commonalities (frequency) of the features. We experimentally compare to uniform sampling and state of the art sampling approaches. We show that our new search strategy outperforms all the other approaches and has the fastest running time.
Article
Full-text available
Feature models are commonly used to specify the valid configurations of product lines. As industrial feature models are typically complex, researchers and practitioners employ various automated analyses to study the configuration spaces. Many of these automated analyses require that numerous complex computations are executed on the same feature model, for example by querying a SAT or #SATsolver. With knowledge compilation, feature models can be compiled in a one-time effort to a target language that enables polynomial-time queries for otherwise more complex problems. In this work, we elaborate on the potential of employing knowledge compilation on feature models. First, we gather various feature-model analyses and study their computational complexity with regard to the underlying computational problem and the number of solver queries required for the respective analysis. Second, we collect knowledge-compilation target languages and map feature-model analyses to the languages that make the analysis tractable. Third, we empirically evaluate publicly available knowledge compilers to further inspect the potential benefits of knowledge-compilation target languages.
Article
Automatic generation of random test inputs is an approach that can alleviate the challenges of manual test case design. However, random test cases may be ineffective in fault detection and increase testing cost, especially in systems where test execution is resource- and time-consuming. To remedy this, the domain knowledge of test engineers can be exploited to select potentially effective test cases. To this end, test selection constraints suggested by domain experts can be utilized either for filtering randomly generated test inputs or for direct generation of inputs using constraint solvers. In this paper, we propose a domain specific language (DSL) for formalizing locality-based test selection constraints of autonomous agents and discuss the impact of test selection filters, specified in our DSL, on randomly generated test cases. We study and compare the performance of filtering and constraint solving approaches in generating selective test cases for different test scenario parameters and discuss the role of these parameters in test generation performance. Through our study, we provide criteria for suitability of the random data filtering approach versus the constraint solving one under the varying size and complexity of our testing problem. We formulate the corresponding research questions and answer them by designing and conducting experiments using QuickCheck for random test data generation with filtering and Z3 for constraint solving. Our observations and statistical analysis indicate that applying filters can significantly improve test efficiency of randomly generated test cases. Furthermore, we observe that test scenario parameters affect the performance of the filtering and constraint solving approaches differently. In particular, our results indicate that the two approaches have complementary strengths: random generation and filtering works best for large agent numbers and long paths, while its performance degrades in the larger grid sizes and more strict constraints. On the contrary, constraint solving has a robust performance for large grid sizes and strict constraints, while its performance degrades with more agents and long paths.
Conference Paper
Full-text available
Software systems are becoming increasingly configurable. A paradigmatic example is the Linux kernel, which can be adjusted for a tremendous variety of hardware devices, from mobile phones to supercomputers, thanks to the thousands of configurable features it supports. In principle, many relevant problems on configurable systems, such as completing a partial configuration to get the system instance that consumes the least energy or optimizes any other quality attribute, could be solved through exhaustive analysis of all configurations. However, configuration spaces are typically colossal and cannot be entirely computed in practice. Alternatively, configuration samples can be analyzed to approximate the answers. Generating those samples is not trivial since features usually have inter-dependencies that constrain the configuration space. Therefore, getting a single valid configuration by chance is extremely unlikely. As a result, advanced samplers are being proposed to generate random samples at a reasonable computational cost. However, to date, no sampler can deal with highly configurable complex systems, such as the Linux kernel. This paper proposes a new sampler that does scale for those systems, based on an original theoretical approach called extensible logic groups. The sampler is compared against five other approaches. Results show our tool to be the fastest and most scalable one.
Article
Full-text available
Many analyses on configurable software systems are intractable when confronted with colossal and highly-constrained configuration spaces. These analyses could instead use statistical inference, where a tractable sample accurately predicts results for the entire space. To do so, the laws of statistical inference requires each member of the population to be equally likely to be included in the sample, i.e., the sampling process needs to be “uniform”. SAT-samplers have been developed to generate uniform random samples at a reasonable computational cost. However, there is a lack of experimental validation over colossal spaces to show whether the samplers indeed produce uniform samples or not. This paper (i) proposes a new sampler named BDDSampler, (ii) presents a new statistical test to verify sampler uniformity, and (iii) reports the evaluation of BDDSampler and five other state-of-the-art samplers: KUS, QuickSampler, Smarch, Spur, and Unigen2. Our experimental results show only BDDSampler satisfies both scalability and uniformity.
Article
Testing software product lines (SPLs) is difficult due to a huge number of possible products to be tested. Recently, there has been a growing interest in similarity-based testing of SPLs, where similarity is used as a surrogate metric for the t -wise coverage. In this context, one of the primary goals is to sample, by optimizing similarity metrics using search-based algorithms, a small subset of test cases (i.e., products) as dissimilar as possible, thus potentially making more t -wise combinations covered. Prior work has shown, by means of empirical studies, the great potential of current similarity-based testing approaches. However, the rationale of this testing technique deserves a more rigorous exploration. To this end, we perform correlation analyses to investigate how similarity metrics are correlated with the t -wise coverage. We find that similarity metrics generally have significantly positive correlations with the t -wise coverage. This well explains why similarity-based testing works, as the improvement on similarity metrics will potentially increase the t -wise coverage. Moreover, we explore, for the first time, the use of the novelty search (NS) algorithm for similarity-based SPL testing. The algorithm rewards “novel” individuals, i.e., those being different from individuals discovered previously, and this well matches the goal of similarity-based SPL testing. We find that the novelty score used in NS has (much) stronger positive correlations with the t -wise coverage than previous approaches relying on a genetic algorithm (GA) with a similarity-based fitness function. Experimental results on 31 software product lines validate the superiority of NS over GA, as well as other state-of-the-art approaches, concerning both t -wise coverage and fault detection capacity. Finally, we investigate whether it is useful to combine two satisfiability solvers when generating new individuals in NS, and how the performance of NS is affected by its key parameters. In summary, looking for novelty provides a promising way of sampling diverse test cases for SPLs.
Conference Paper
Full-text available
Several relevant analyses on configurable software systems remain intractable because they require examining vast and highly-constrained configuration spaces. Those analyses could be addressed through statistical inference, i.e., working with a much more tractable sample that later supports generalizing the results obtained to the entire configuration space. To make this possible, the laws of statistical inference impose an indispensable requirement: each member of the population must be equally likely to be included in the sample, i.e., the sampling process needs to be ``uniform''. Various SAT-samplers have been developed for generating uniform random samples at a reasonable computational cost. Unfortunately, there is a lack of experimental validation over large configuration models to show whether the samplers indeed produce genuine uniform samples or not. This paper (i) presents a new statistical test to verify to what extent samplers accomplish uniformity and (ii) reports the evaluation of four state-of-the-art samplers: Spur, QuickSampler, Unigen2, and Smarch. According to our experimental results, only Spur satisfies both scalability and uniformity.
Conference Paper
Full-text available
Software Product Lines (SPLs) are highly configurable systems. This raises the challenge to find optimal performing configurations for an anticipated workload. As SPL configuration spaces are huge, it is infeasible to benchmark all configurations to find an optimal one. Prior work focused on building performance models to predict and optimize SPL configurations. Instead, we randomly sample and recursively search a configuration space directly to find near-optimal configurations without constructing a prediction model. Our algorithms are simpler and have higher accuracy and efficiency.
Article
Full-text available
A software product line comprises a family of software products that share a common set of features. Testing an entire product-line product-by-product is infeasible due to the potentially exponential number of products in the number of features. Accordingly, several sampling approaches have been proposed to select a presumably minimal, yet sufficient number of products to be tested. Since the time budget for testing is limited or even a priori unknown, the order in which products are tested is crucial for effective product-line testing. Prioritizing products is required to increase the probability of detecting faults faster. In this article, we propose similarity-based prioritization, which can be efficiently applied on product samples. In our approach, we incrementally select the most diverse product in terms of features to be tested next in order to increase feature interaction coverage as fast as possible during product-by-product testing. We evaluate the gain in the effectiveness of similarity-based prioritization on three product lines with real faults. Furthermore, we compare similarity-based prioritization to random orders, an interaction-based approach, and the default orders produced by existing sampling algorithms considering feature models of various sizes. The results show that our approach potentially increases effectiveness in terms of fault detection ratio concerning faults within real-world product-line implementations as well as synthetically seeded faults. Moreover, we show that the default orders of recent sampling algorithms already show promising results, which, however, can still be improved in many cases using similarity-based prioritization.
Conference Paper
Quality assurance for product lines is often infeasible for each product separately. Instead, only a subset of all products (i.e., a sample) is considered during testing such that at least the coverage of certain feature interactions is guaranteed. While pair-wise interaction sampling only covers all interactions between two features, its generalization to t-wise interaction sampling ensures coverage for all interactions among t features. However, sampling large product lines poses a challenge, as today's algorithms tend to run out of memory, do not terminate, or produce samples, which are too large to be tested. To initiate a community effort, we provide a set of large real-world feature models with up-to 19 thousand features, which are supposed to be sampled. The performance of sampling approaches is evaluated based on the CPU time and memory consumed to retrieve a sample, the sample size for a given coverage (i.e. the value of t) and whether the sample achieves full t-wise coverage. A well-performing sampling algorithm achieves full t-wise coverage, while minimizing the other properties as best as possible.
Article
A software product line comprises a family of software products that share a common set of features. It enables customers to compose software systems from a managed set of features. Testing every product of a product line individually is often infeasible due to the exponential number of possible products in the number of features. Several approaches have been proposed to restrict the number of products to be tested by sampling a subset of products achieving sufficient combinatorial interaction coverage. However, existing sampling algorithms do not scale well to large product lines, as they require a considerable amount of time to generate the samples. Moreover, samples are not available until a sampling algorithm completely terminates. As testing time is usually limited, we propose an incremental approach of product sampling for pairwise interaction testing (called IncLing), which enables developers to generate samples on demand in a step-wise manner. Furthermore, IncLing uses heuristics to efficiently achieve pairwise interaction coverage with a reasonable number of products. We evaluated IncLing by comparing it against existing sampling algorithms using feature models of different sizes. The results of our approach indicate efficiency improvements for product-line testing.
Conference Paper
A software product line comprises a family of software products that share a common set of features. It enables customers to compose software systems from a managed set of features. Testing every product of a product line individually is often infeasible due to the exponential number of possible products in the number of features. Several approaches have been proposed to restrict the number of products to be tested by sampling a subset of products achieving sufficient combinatorial interaction coverage. However, existing sampling algorithms do not scale well to large product lines, as they require a considerable amount of time to generate the samples. Moreover, samples are not available until a sampling algorithm completely terminates. As testing time is usually limited, we propose an incremental approach of product sampling for pairwise interaction testing (called IncLing), which enables developers to generate samples on demand in a step-wise manner. Furthermore, IncLing uses heuristics to efficiently achieve pairwise interaction coverage with a reasonable number of products. We evaluated IncLing by comparing it against existing sampling algorithms using feature models of different sizes. The results of our approach indicate efficiency improvements for product-line testing.
Conference Paper
Combinatorial interaction testing is an approach for testing product lines. A set of products to test can be set up from the covering array generated from a feature model. The products occurring in a partial covering array, however, may not focus on the important feature interactions nor resemble any actual product in the market. Knowledge about which interactions are prevalent in the market can be modeled by assigning weights to sub-product lines. Such models enable a covering array generator to select important interactions to cover first for a partial covering array, enable it to construct products resembling those in the market and enable it to suggest simple changes to an existing set of products to test for incremental adaption to market changes. We report experiences from the application of weighted combinatorial interaction testing for test product selection on an industrial product line, TOMRA's Reverse Vending Machines.
Article
FeatureIDE is an open-source framework for feature-oriented software de-velopment (FOSD) based on Eclipse. FOSD is a paradigm for construction, customization, and synthesis of software systems. Code artifacts are mapped to features and a customized software system can be generated given a selec-tion of features. The set of software systems that can be generated is called a software product line (SPL). FeatureIDE supports several FOSD imple-mentation techniques such as feature-oriented programming, aspect-oriented programming, delta-oriented programming, and preprocessors. All phases of FOSD are supported in FeatureIDE, namely domain analysis, requirements analysis, domain implementation, and software generation.
Conference Paper
We introduce sharpSAT, a new #SAT solver that is based on the well known DPLL algorithm and techniques from SAT and #SAT solvers. Most importantly, we introduce an entirely new approach of coding components, which reduces the cache size by at least one order of magnitude, and a new cache management scheme. Furthermore, we apply a well known look ahead based on BCP in a manner that is well suited for #SAT solving. We show that these techniques are highly beneficial, especially on large structured instances, such that our solver performs significantly better than other #SAT solvers.
Conference Paper
Feature models and associated feature diagrams allow modeling and visualizing the constraints leading to the valid products of a product line. In terms of their expressiveness, feature diagrams are equivalent to propositional formulas which makes them theoretically expensive to process and analyze. For example, satisfying propositional formulas, which translates into finding a valid product for a given feature model, is an NP-hard problem, which has no fast, optimal solution. This theoretical complexity could prevent the use of powerful analysis techniques to assist in the development and testing of product lines. However, we have found that satisfying realistic feature models is quick. Thus, we show that combinatorial interaction testing of product lines is feasible in practice. Based on this, we investigate covering array generation time and results for realistic feature models and find where the algorithms can be improved.
Article
Combinatorial interaction testing (CIT) is a cost-effective sampling technique for discovering interaction faults in highly-configurable systems. Constrained CIT extends the technique to situations where some features cannot coexist in a configuration, and is therefore more applicable to real-world software. Recent work on greedy algorithms to build CIT samples now efficiently supports these feature constraints. But when testing a single system configuration is expensive, greedy techniques perform worse than meta-heuristic algorithms, because greedy algorithms generally need larger samples to exercise the same set of interactions. On the other hand, current meta-heuristic algorithms have long run times when feature constraints are present. Neither class of algorithm is suitable when both constraints and the cost of testing configurations are important factors. Therefore, we reformulate one meta-heuristic search algorithm for constructing CIT samples, simulated annealing, to more efficiently incorporate constraints. We identify a set of algorithmic changes and experiment with our modifications on 35 realistic constrained problems and on a set of unconstrained problems from the literature to isolate the factors that improve performance. Our evaluation determines that the optimizations reduce run time by a factor of 90 and accomplish the same coverage objectives with even fewer system configurations. Furthermore, the new version compares favorably with greedy algorithms on real-world problems, and, though our modifications were aimed at constrained problems, it shows similar advantages when feature constraints are absent.
Conference Paper
Feature models are used to specify members of a product-line. Despite years of progress, contemporary tools provide limited support for fea- ture constraints and offer little or no support for debugging feature models. We integrate prior results to connect feature models, grammars, and propositional formulas. This connection allows arbitrary propositional constraints to be defined among features and enables off-the-shelf satisfiability solvers to debug feature models. We also show how our ideas can generalize recent results on the staged configuration of feature models.
Intro to Probability - Dartmouth College
  • C M Grinstead
  • J L Snell
C.M. Grinstead and J.L. Snell. 2019. Intro to Probability -Dartmouth College. https://www.dartmouth.edu/~chance/teaching_aids/books_articles/ probability_book/amsbook.mac.pdf.
Uniform Sampling from Kconfig Feature Models
  • Jeho Oh
  • Paul Gazzillo
  • Don Batory
  • Marijn Heule
  • Maggie Myers
Jeho Oh, Paul Gazzillo, Don Batory, Marijn Heule, and Maggie Myers. 2019. Uniform Sampling from Kconfig Feature Models. Technical Report TR-19-02. University of Texas at Austin, Department of Computer Science.
Evaluating improvements to a meta-heuristic search for constrained interaction testing
  • Myra B Brady J Garvin
  • Matthew B Cohen
  • Dwyer
Armin Biere, Marijn Heule, and Hans van Maaren
  • Armin Biere
  • Marijn Heule
  • Hans Van Maaren
  • Biere Armin