Conference Paper

A Quantitative Analysis of Variability Warnings in Linux

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In order to get insight into challenges with quality in highly-configurable software, we analyze one of the largest open source projects, the Linux kernel, and quantify basic properties of configuration-related warnings. We automatically analyze more than 20 thousand valid and distinct random configurations, in a computation that lasted more than a month. We count and classify a total of 400,000 warnings to get an insight in the distribution of warning types, and the location of the warnings. We run both on a stable and unstable version of the Linux kernel. The results show that Linux contains a significant amount of configuration-dependent warnings, including many that appear harmful. In fact, it appears that there are no configuration-independent warnings in the kernel at all, adding to our knowledge about relevance of family-based analyses.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In highly configurable systems, this testing task is complicated by the large number of interacting features. For example, the Linux kernel contains thousands of interacting features (such as compilation options or installed libraries) [2]. Configurations (i.e. ...
... sets of features) can be tested by instantiating them on the given product line (for example, compiling the Linux kernel with specific options and libraries). These tests can be expensive (in terms of running time [2], memory [3], or manpower [4]), so efficient test suites (a set of configurations) need to be generated. ...
... Using the example feature model in Figure 1a, suppose that the first configuration returned is configuration 5, which contains the Shooter feature. The Shooter feature has a commonality of 1 3 (because it only appears in configurations 5 and 6), but it has an observed frequency of 1, so its weight is be 2 3 . This weight is high, which increases the chance of returning configurations not containing Shooter. ...
Article
Full-text available
t-wise coverage is one of the most important techniques used to test configurations of software for finding bugs. It ensures that interactions between features of a Software Product Line (SPL) are tested. The size of SPLs (of thousands of features) makes the problem of finding a good test suite computationally expensive, as the number of t-wise combinations grows exponentially. In this article, we leverage Constraint Programming’s search strategies to generate test suites with a high coverage of configurations. We analyse the behaviour of the default random search strategy, and then we propose an improvement based on the commonalities (frequency) of the features. We experimentally compare to uniform sampling and state of the art sampling approaches. We show that our new search strategy outperforms all the other approaches and has the fastest running time.
... Kernel key challenge is that configuration options are spread out over different files of the code base, possibly across subsystems [1,2,99,16,95,112]. ...
... randconfig has the merit of generating valid configurations respecting the numerous constraints between options. It is also a mature tool that the Linux community maintains and uses [95]. Though randconfig does not ...
... Another threat to validity concerns the (lack of) uniformity of randconfig. Indeed randconfig does not provide a perfect uniform distribution over valid configurations [95]. The strategy of randconfig is to randomly enable or disable options according to the order in the Kconfig files. ...
Thesis
Variability is the blessing and the curse of today software development. On one hand, it allows for fast and cheap development, while offering efficient customization to precisely meet the needs of a user. On the other hand, the increase in complexity of the systems due to the sheer amount of possible configurations makes it hard or even impossible for users to correctly utilize them, for developers to properly test them, or for experts to precisely grasp their functioning.Machine Learning is a research domain that grew in accessibility and variety of usages over the last decades. It attracted interest from researchers from the Software Engineering domain for its ability to handle the complexity of Software Product Lines on problems they were tackling such as performance prediction or optimization. However, all studies presenting learning-based solutions in the SPL domain failed to explore the scalability of their techniques on systems with colossal configuration space (>1000 options).In this thesis, we focus on the Linux Kernel. With more than 15.000 options, it is very representative of the complexity of systems with colossal configuration spaces. We first apply various learning techniques to predict the kernel binary size, and report that most of the techniques fail to produce accurate results. In particular, performance-influence model, a learning technique tailored for SPL problem, does not even work on such large dataset. Among the tested techniques, only Tree-based algorithms and Neural Networks are able to produce an accurate model in an acceptable time.To mitigate the problems created by colossal configuration spaces on learning techniques, we propose a feature selection technique leveraging Random Forest, enhanced toward better stability. We show that by using the feature selection, the training time can be greatly reduced, and the accuracy can be improved. This Tree-based feature selection technique is also completely automated and does not rely on prior knowledge on the system.Performance specialization is a technique that constrains the configuration space of a software system to meet a given performance criterion. It is possible to automate the specialization process by leveraging Decision Trees. While only Decision Tree Classifier has been used for this task, we explore the usage of Decision Tree Regressor, as well as a novel hybrid approach. We test and compare the different approaches on a wide range of systems, as well as on Linux to ensure the scalability on colossal configuration spaces. In most cases, including Linux, we report at least 90\% accuracy, and each approach having their own particular strength compared to the others. At last, we also leverage the Tree-based feature selection, whose most notorious effect is the reduction of the training time of Decision Trees on Linux, downing from one minute to a second or less.The last contribution explores the sustainability of a performance model across versions of a configurable system. We reused the model trained on the 4.13 version of Linux from our first contribution, and measured its accuracy on six later versions up to 5.8, spanning over three years. We show that a model is quickly outdated and unusable as is. To preserve the accuracy of the model over versions, we use transfer learning with the help of Tree-based algorithms to maintain it at a reduced cost. We tackle the problem of heterogeneity of the configuration space, that is evolving with each version. We show that the transfer approach allows for an acceptable accuracy at low cost, and vastly outperforms a learning from scratch approach using the same budget.Overall, this thesis focuses on the problems of systems with colossal configuration spaces such as Linux, and show that Tree-based algorithms are a valid solution, versatile enough to answer a wide range of problem, and accurate enough to be considered.
... randconfig has the merit of generating valid configurations that respect the numerous constraints between options. It is also a mature tool that the Linux community maintains and uses [40]. Though randconfig does not produce uniform, random samples (see Section 7), there is a diversity within the values of options (being 'y', 'n', or 'm'). ...
... Another threat to internal validity concerns the (lack of) uniformity of randconfig. Indeed randconfig does not provide a perfect uniform distribution over valid configurations [40]. The strategy of randconfig is to randomly enable or disable options according to the order in the Kconfig files. ...
... Several empirical studies [40], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75] have considered different aspects of Linux (build system, variability implementation, constraints, bugs, compilation warnings). However, most of the works did not concretely build configurations in the large, a necessary and costly step for training a prediction model. ...
Article
With large scale and complex configurable systems, it is hardfor users to choose the right combination of options (i.e., configurations)in order to obtain the wanted trade-off between functionality and per-formance goals such as speed or size. Machine learning can help inrelating these goals to the configurable system options, and thus, predictthe effect of options on the outcome, typically after a costly training step.However, many configurable systems evolve at such a rapid pace that itis impractical to retrain a new model from scratch for each new version.In this paper, we propose a new method to enable transfer learningof binary size predictions among versions of the same configurablesystem. Taking the extreme case of the Linux kernel with its14,500configuration options, we first investigate how binary size predictionsof kernel size degrade over successive versions. We show that thedirect reuse of an accurate prediction model from 2017 quickly becomesinaccurate when Linux evolves, up to a 32% mean error by August 2020.We thus propose a new approach for transfer evolution-aware modelshifting (TEAMS). It leverages the structure of a configurable systemto transfer an initial predictive model towards its future versions witha minimal amount of extra processing for each version. We show thatTEAMS vastly outperforms state of the art approaches over the 3 yearshistory of Linux kernels, from 4.13 to 5.8.
... Although there are promising approaches for generating covering arrays, most of them do not scale well to large feature models [Medeiros et al., 2016;Liebig et al., 2013] and their execution takes a considerable amount of time. As a result, the Linux kernel developers use the built-in facility of the Linux kernel build system randconfig to generate random configurations, because none of the existing sampling algorithms scale to the feature model of the Linux kernel with over 15 thousand features [Melo et al., 2016]. Furthermore, testers cannot start testing until the entire sampling process has terminated, because no intermediate results are reported. ...
... As those samples are usually not available until a sampling algorithm is completely terminated, the sampling process may take a considerable part of the limited testing time. As a result, the Linux kernel developers use the built-in facility of the Linux kernel build system randconfig to generate random configurations, because none of the existing sampling algorithms scales to a more recent version of the feature model of the Linux kernel with over 15 thousand features [Melo et al., 2016]. An alternative to combinatorial interaction testing, several search-based approaches have been proposed to generate a set of products [Henard et al., 2014b;Henard et al., 2013a]. ...
... Due to the scalability problem of the existing sampling algorithms, it is common in practice to generate random products to be tested, especially for large product lines (e.g., randconfig in Linux) [Melo et al., 2016]. For this purpose, we also implemented a random generator to create a fixed number of random configurations based on the satisfiability solver Sat4J [Le Berre and Parrain, 2010]. ...
Thesis
A software product line comprises a family of software products that share a common set of features. Testing an entire product-line product by product is infeasible, because the number of possible products can be exponential in the number of features. Combinatorial interaction testing is a sampling strategy that selects a presumably minimal, yet sufficient number of products to be tested. Several sampling approaches have been proposed, however, they do not scale well to large product lines, as they require a considerable amount of time to compute the samples. In addition, the number of generated products can still be large, especially if the product line has a large number of features. Since the time budget for testing is limited or even a priori unknown, the order in which products are tested is crucial for effective product-line testing to increase the probability of detecting faults faster. Hence, we propose 1) product prioritization to increase the probability of detecting faults faster and 2) incremental sampling to generate samples in a step-wise manner. Regarding product prioritization, we propose similarity-driven product prioritization that considers problem-space information (i.e., feature selection) and solution-space information (i.e., delta modeling) to select the most diverse product to be tested next. With respect to sampling, we propose an incremental algorithm for product sampling called IncLing, which enables developers to generate samples on demand in a step-wise manner. The results of similarity-driven product prioritization show a potential improvement in the effectiveness of product-line testing (i.e., increasing the early rate of fault detection). Moreover, we show that applying the algorithm IncLing to sample products enhances the efficiency of product-line testing compared to existing sampling algorithms. Thus, we conclude that in which order these products are generated as well as tested may enhance the product-line testing effectiveness.
... Ensuring quality for all configurations is a difficult task. For example, Melo et al. compiled 42,000+ random Linux kernels and found that only 226 did not yield any compilation warning (Melo et al. 2016). Though formal methods and program analysis can identify some classes of defects (Thüm et al. 2014;Classen et al. 2013) -leading to variability-aware testing approaches (e.g., Nguyen et al. 2014;Kim et al. 2011Kim et al. , 2013) -a common practice is still to execute and test a sample of (representative) configurations. ...
... In short, we report on the first ever endeavour to test all possible configurations of the industry-strength open-source configurable software system: JHipster. While there have been efforts in this direction for Linux kernels, their variability space forces to focus on subsets (the selection of 42,000+ kernels corresponds to one month of computation Melo et al. 2016) or to investigate bugs qualitatively (Abal et al. 2014(Abal et al. , 2018. Specifically, the main contributions and findings of this article are: ...
... The most difficult part of realising the infrastructure was to validate it, especially in a distributed setting. These costs are system-dependent: for example, the Linux project provides tools to compile distinct random kernels, which can be used for various analyses (e.g., Melo et al. 2016;Henard et al. 2013d), and ease the realisation of a testing infrastructure. ...
Article
Full-text available
Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them if the effort required to do so remains acceptable. Not only this: we believe there is a lot to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of the industry-strength, open source configurable software system JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster’s lead developers.
... Accordingly, many relevant analyses on LK's configurability, which would be trivial if the configuration space could be enumerated, rely on working with configuration samples [29,30,37]. For example, the random sampling of LK has been used for (i) debugging compilation and building errors [27,28,36,52], (ii) accelerating configuration building times [39], (iii) predicting the performance of configurations [1,25,42], (iv) estimating the influence each option has on configurations' performance [43], (v) finding the configuration that optimizes certain performance metrics [33], among others. ...
... In other words, most of the analyses were possible thanks to randconfig. Section 2.1 reviews more randconfig usages described in the literature [1,17,24,25,27,52]. ...
... С общей точки зрения модель ОС Linuxэто множество разных операционных артефактов (функций) и интерфейсов между ними [17][18][19]: Мoс = (Сk, Mf, Ms,Мо, Mi), где (5) Ckмножество отдельных фрагментов, артефактов ОС; Msмножество характеристик ядра Linux (более 10000 для версии 4.11); Момножество ограничений характеристик функций на глубине дерева зависимостей (глубина 8 для 22 ограничений [20]); ...
... Для генерации из них некоторого варианта ОС требуется извлечь из ядра системы необходимые готовые артефакты или функциональные элементы с их интерфейсами. Для извлечения изменчивости (Mining variability) используется инструмент LEADT [19]. В нем содержатся средства поиска и анализа готовых ГОР в унаследованном коде ОС. ...
... Random sampling: This is probably the simplest way to sample configurations from SPLs. It creates a sample set by randomly assigning true or false to each feature for each configuration Guo et al. (2013); Medeiros et al. (2016); Liebig et al. (2013); Melo et al. (2016). Because of complicated constraints presented among features, this sampling strategy has a high chance to generate invalid configurations. ...
Article
Full-text available
Sampling a small, valid and representative set of configurations from software product lines (SPLs) is important, yet challenging due to a huge number of possible configurations to be explored. Recently, the sampling strategy based on satisfiability (SAT) solving has enjoyed great popularity due to its high efficiency and good scalability. However, this sampling offers no guarantees on diversity, especially in terms of the number of selected features, an important property to characterize a configuration. In this paper, we propose a probability-aware diversification (PaD) strategy to cooperate with SAT solving in generating diverse configurations, with the effect that valid configurations are efficiently generated by SAT solving while also maintaining diversity brought by PaD. Experimental results on 51 public SPLs show that, when working cooperatively with PaD, the performance (regarding diversity) of off-the-shelf SAT solvers has substantial improvements, with large effect sizes observed on more than 71% of all the cases. Furthermore, we propose a general search-based framework where PaD and evolutionary algorithms can work together, and instantiate this framework in the context of search-based diverse sampling and search-based multi-objective SPL configuration (where there is a practical need of generating diverse configurations). It is demonstrated by the experimental results that PaD also brings abundant performance gains to these search-based approaches. Finally, we apply PaD to a practical problem, i.e., machine learning based performance predictions of SPLs, and show that using PaD tends to improve the accuracy of performance prediction models.
... It makes reasoning about programs more difficult [5]. As a consequence configuration-dependent (variability) bugs appear [6], [7]. Previous studies [5], [8] have shown that debugging is hard and time consuming in the presence of variability. ...
... Although there are promising approaches for generating covering arrays, most of them do not scale well to large feature models [25,27] and their execution takes a considerable amount of time. As a result, the Linux kernel developers use the built-in facility of the Linux kernel build system randconfig to generate random configurations, because none of the existing sampling algorithms scale to the feature model of the Linux kernel with over 15 thousands features [28]. Furthermore, testers cannot start testing until the entire sampling process has terminated, because no intermediate results are reported. ...
Conference Paper
A software product line comprises a family of software products that share a common set of features. It enables customers to compose software systems from a managed set of features. Testing every product of a product line individually is often infeasible due to the exponential number of possible products in the number of features. Several approaches have been proposed to restrict the number of products to be tested by sampling a subset of products achieving sufficient combinatorial interaction coverage. However, existing sampling algorithms do not scale well to large product lines, as they require a considerable amount of time to generate the samples. Moreover, samples are not available until a sampling algorithm completely terminates. As testing time is usually limited, we propose an incremental approach of product sampling for pairwise interaction testing (called IncLing), which enables developers to generate samples on demand in a step-wise manner. Furthermore, IncLing uses heuristics to efficiently achieve pairwise interaction coverage with a reasonable number of products. We evaluated IncLing by comparing it against existing sampling algorithms using feature models of different sizes. The results of our approach indicate efficiency improvements for product-line testing.
Article
Full-text available
In system software environments, a vast amount of information circulates, making it crucial to utilize this information in order to enhance the operation of such systems. One such system is the Linux kernel, which not only boasts a completely open-source nature, but also provides a comprehensive history through its git repository. Here, every logical code change is accompanied by a message written by the developer in natural language. Within this expansive repository, our focus lies on error correction messages from fixing commits, as analyzing their text can help identify the most common types of errors. Building upon our previous works, this paper proposes the utilization of data analysis methods for this purpose. To achieve our objective, we explore various techniques for processing repository messages and employing automated methods to pinpoint the prevalent bugs within them. By calculating distances between vectorizations of bug fixing messages and grouping them into clusters, we can effectively categorize and isolate the most frequently occurring errors. Our approach is applied to multiple prominent parts within the Linux kernel, allowing for comprehensive results and insights into what is going on with bugs in different subsystems. As a result, we show a summary of bug fixes in such parts of the Linux kernel as kernel, sched, mm, net, irq, x86 and arm64.
Conference Paper
Full-text available
Many critical software systems developed in C utilize compile-time configurability. The many possible configurations of this software make bug detection through static analysis difficult. While variability-aware static analyses have been developed, there remains a gap between those and state-of-the-art static bug detection tools. In order to collect data on how such tools may perform and to develop real-world benchmarks, we present a way to leverage configuration sampling, off-the-shelf “variability-oblivious” bug detectors, and automatic feature identification techniques to simulate a variability-aware analysis. We instantiate our approach using four popular static analysis tools on three highly configurable, real-world C projects, obtaining 36,061 warnings, 80% of which are variability warnings. We analyze the warnings we collect from these experiments, finding that most results are variability warnings of a variety of kinds such as NULL dereference. We then manually investigate these warnings to produce a benchmark of 77 confirmed true bugs (52 of which are variability bugs) useful for future development of variability-aware analyses.
Article
Testing a software product line such as Linux implies building the source with different configurations. Manual approaches to generate configurations that enable code of interest are doomed to fail due to the high amount of variation points distributed over the feature model, the build system and the source code. Research has proposed various approaches to generate covering configurations, but the algorithms show many drawbacks related to run-time, exhaustiveness and the amount of generated configurations. Hence, analyzing an entire Linux source can yield more than 30 thousand configurations and thereby exceeds the limited budget and resources for build testing. In this paper, we present an approach to fill the gap between a systematic generation of configurations and the necessity to fully build software in order to test it. By merging previously generated configurations, we reduce the number of necessary builds and enable global variability-aware testing. We reduce the problem of merging configurations to finding maximum cliques in a graph. We evaluate the approach on the Linux kernel, compare the results to common practices in industry, and show that our implementation scales even when facing graphs with millions of edges.
Article
A software product line comprises a family of software products that share a common set of features. It enables customers to compose software systems from a managed set of features. Testing every product of a product line individually is often infeasible due to the exponential number of possible products in the number of features. Several approaches have been proposed to restrict the number of products to be tested by sampling a subset of products achieving sufficient combinatorial interaction coverage. However, existing sampling algorithms do not scale well to large product lines, as they require a considerable amount of time to generate the samples. Moreover, samples are not available until a sampling algorithm completely terminates. As testing time is usually limited, we propose an incremental approach of product sampling for pairwise interaction testing (called IncLing), which enables developers to generate samples on demand in a step-wise manner. Furthermore, IncLing uses heuristics to efficiently achieve pairwise interaction coverage with a reasonable number of products. We evaluated IncLing by comparing it against existing sampling algorithms using feature models of different sizes. The results of our approach indicate efficiency improvements for product-line testing.
Article
Full-text available
Complex software systems always exist for a long time, sometimes changing, and this leads to a variety of versions of such a system. In additional complex software systems usually have different (sometimes a lot) configurations due to different hardware and software environments, where they are intended to operate, or due to different user types with specific requirements. So, a complex software system can be regarded more correctly as a software system family or a software product line. Taking software families in consideration helps to increase reuse of their components and other software development artifacts. In difference with earlier works on software reuse, mostly focused on code or design reuse, software system family development tries to expand reuse on all kinds of development artifacts and activities, including documentation, verification, operation support, deployment, etc. One of the software system family development activities is modeling of family variability. This paper considers modern methods and approaches to such modeling, especially focusing on modeling of operating systems families variability. The research, which results are presented in this paper, is supported by RFBR.
Conference Paper
Testing a software product line such as Linux implies building the source with different configurations. Manual approaches to generate configurations that enable code of interest are doomed to fail due to the high amount of variation points distributed over the feature model, the build system and the source code. Research has proposed various approaches to generate covering configurations, but the algorithms show many drawbacks related to run-time, exhaustiveness and the amount of generated configurations. Hence, analyzing an entire Linux source can yield more than 30 thousand configurations and thereby exceeds the limited budget and resources for build testing. In this paper, we present an approach to fill the gap between a systematic generation of configurations and the necessity to fully build software in order to test it. By merging previously generated configurations, we reduce the number of necessary builds and enable global variability-aware testing. We reduce the problem of merging configurations to finding maximum cliques in a graph. We evaluate the approach on the Linux kernel, compare the results to common practices in industry, and show that our implementation scales even when facing graphs with millions of edges.
Article
Full-text available
This paper regards problems of analysis and verification of complex modern operating systems, which should take into account variability and configurability of those systems. The main problems of current interest are related with conditional compilation as variability mechanism widely used in system software domain. It makes impossible fruitful analysis of separate pieces of code combined into system variants, because most of these pieces of code has no interface and behavior. From the other side, analysis of all separate variants is also impossible due to their enormous number. The paper provides an overview of analysis methods that are able to cope with the stated problems, distinguishing two classes of such approaches: analysis of variants sampling based on some variants coverage criteria and variation-aware analysis processing many variants simultaneously and using similarities between them to minimize resources required. For future development we choose the most scalable technics, sampling analysis based on code coverage and on coverage of feature combinations and variation-aware analysis using counterexample guided abstraction refinement approach.
Article
Context: Maintaining software families is not a trivial task. Developers commonly introduce bugs when they do not consider existing dependencies among features. When such implementations share program elements, such as variables and functions, inadvertently using these elements may result in bugs. In this context, previous work focuses only on the occurrence of intraprocedural dependencies, that is, when features share program elements within a function. But at the same time, we still lack studies investigating dependencies that transcend the boundaries of a function, since these cases might cause bugs as well. Objective: This work assesses to what extent feature dependencies exist in actual software families, answering research questions regarding the occurrence of intraprocedural, global, and interprocedural dependencies and their characteristics. Method: We perform an empirical study covering 40 software families of different domains and sizes. We use a variability-aware parser to analyze families source code while retaining all variability information. Results: Intraprocedural and interprocedural feature dependencies are common in the families we analyze: more than half of functions with preprocessor directives have intraprocedural dependencies, while over a quarter of all functions have interprocedural dependencies. The median depth of interprocedural dependencies is 9. Conclusion: Given these dependencies are rather common, there is a need for tools and techniques to raise developers awareness in order to minimize or avoid problems when maintaining code in the presence of such dependencies. Problems regarding interprocedural dependencies with high depths might be harder to detect and fix.
Article
Full-text available
This introduction describes exploratory studies about the relevance of Karl Weick's HRO propositions in Latin American contexts. Eleven case studies constitute the core of the special issue. The research questions include the following issues. How do Weick's concepts apply to organizations that operate in this region? What are the lessons learned from these eleven mindful organizations? How can these insights inform the management practices of other business leaders in Latin America? The organizations described in these eleven cases, which have been chosen from a sample of 396 cases produced between 2001 and 2013 at INCAE, the leading producer of cases among the top graduate schools of management in Latin America, exhibit between two and five characteristics of highly mindful organizations: (a) preoccupation with failure, (b) reluctance to simplify, (c) sensitivity to operations, (d) commitment to resilience, and (e) deference to expertise. Despite not being quoted among scholars in Latin America, Weick's concepts of mindfulness and high reliability organizations (HROs) are relevant in a region bound by uncertainty.
Article
Full-text available
Feature-sensitive verification pursues effective analysis of the exponentially many variants of a program family. However, researchers lack examples of concrete bugs induced by variability, occurring in real large-scale systems. Such a collection of bugs is a requirement for goal-oriented research, serving to evaluate tool implementations of feature-sensitive analyses by testing them on real bugs. We present a qualitative study of 42 variability bugs collected from bug-fixing commits to the Linux kernel repository. We analyze each of the bugs, and record the results in a database. In addition, we provide self-contained simplified C99 versions of the bugs, facilitating understanding and tool evaluation. Our study provides insights into the nature and occurrence of variability bugs in a large C software system, and shows in what ways variability affects and increases the complexity of software bugs.
Conference Paper
Full-text available
The variability of configurable systems may lead to configuration-related issues (i.e., faults and warnings) that appear only when we select certain configuration options. Previous studies found that issues related to configurability are harder to detect than issues that appear in all configurations, because variability increases the complexity. However, little effort has been put into understanding configuration-related faults (e.g., undeclared functions and variables) and warnings (e.g., unused functions and variables). To better understand the peculiarities of configuration-related unde-clared/unused variables and functions, in this paper we perform an empirical study of 15 systems to answer research questions related to how developers introduce these issues, the number of configuration options involved, and the time that these issues remain in source files. To make the analysis of several projects feasible, we propose a strategy that minimizes the initial setup problems of variability-aware tools. We detect and confirm 2 undeclared variables , 14 undeclared functions, 16 unused variables, and 7 unused functions related to configurability. We submit 30 patches to fix issues not fixed by developers. Our findings support the effectiveness of sampling (i.e., analysis of only a subset of valid configurations) because most issues involve two or less configuration options. Nevertheless , by analyzing the version history of the projects, we observe that a number of issues remain in the code for several years. Furthermore, the corpus of undeclared/unused variables and functions gathered is a valuable source to study these issues, compare sampling algorithms, and test and improve variability-aware tools.
Article
Full-text available
Software-product-line engineering has gained considerable momentum in recent years, both in industry and in academia. A software product line is a family of software products that share a common set of features. Software product lines challenge traditional analysis techniques, such as type checking, model checking, and theorem proving, in their quest of ensuring correctness and reliability of software. Simply creating and analyzing all products of a product line is usually not feasible, due to the potentially exponential number of valid feature combinations. Recently, researchers began to develop analysis techniques that take the distinguishing properties of software product lines into account, for example, by checking feature-related code in isolation or by exploiting variability information during analysis. The emerging field of product-line analyses is both broad and diverse, so it is difficult for researchers and practitioners to understand their similarities and differences. We propose a classification of product-line analyses to enable systematic research and application. Based on our insights with classifying and comparing a corpus of 123 research articles, we develop a research agenda to guide future research on product-line analyses.
Article
Full-text available
Variability models represent the common and variable features of products in a product line. Since the introduction of FODA in 1990, several variability modeling languages have been proposed in academia and industry, followed by hundreds of research papers on variability models and modeling. However, little is known about the practical use of such languages. We study the constructs, semantics, usage, and associated tools of two variability modeling languages, Kconfig and CDL, which are independently developed outside academia and used in large and significant software projects. We analyze 128 variability models found in 12 open--source projects using these languages. Our study 1) supports variability modeling research with empirical data on the real-world use of its flagship concepts. However, we 2) also provide requirements for concepts and mechanisms that are not commonly considered in academic techniques, and 3) challenge assumptions about size and complexity of variability models made in academic papers. These results are of interest to researchers working on variability modeling and analysis techniques and to designers of tools, such as feature dependency checkers and interactive product configurators.
Article
Full-text available
Preprocessors are often used to implement the variability of a Software Product Line (SPL). Despite their widespread use, they have several drawbacks like code pollution, no separation of concerns, and error-prone. Virtual Separationof Concerns (VSoC) has been used to address some of thesepreprocessor problems by allowing developers to hide featurecode not relevant to the current maintenance task. However, different features eventually share the same variables and methods, so VSoC does not modularize features, since developers do not know anything about hidden features. Thus, the maintenance of one feature might break another. Emergent Interfaces (EI) capture dependencies between a feature maintenance point and parts of other feature implementation, but they do not provide an overall feature interface considering all parts in an integrated way. Thus, we still have the feature modularization problem. To address that, we propose Emergent Feature Interfaces (EFI) that complement EI by treating feature as a module in order to improve modular reasoning on preprocessor-based systems. EFI capture dependencies among entire features, with the potential of improving productivity. Our proposal, implemented in an opensource tool called Emergo, is evaluated with preprocessor-based systems. The results of our study suggest the feasibility and usefulness of the proposed approach.
Conference Paper
Full-text available
Over more than two decades, numerous variability modeling techniques have been introduced in academia and industry. However, little is known about the actual use of these techniques. While dozens of experience reports on software product line engineering exist, only very few focus on variability modeling. This lack of empirical data threatens the validity of existing techniques, and hinders their improvement. As part of our effort to improve empirical understanding of variability modeling, we present the results of a survey questionnaire distributed to industrial practitioners. These results provide insights into application scenarios and perceived benefits of variability modeling, the notations and tools used, the scale of industrial models, and experienced challenges and mitigation strategies.
Chapter
Full-text available
Software engineering today is heavily focused on the ideas of process maturity and continuous improvement. Processes are designed to deliver products. Process engineering should ideally rest on theoretical foundations of sound product engineering; however the field is currently lacking such foundations. Drawing inspiration from compiler design, we present a systematic framework for software product engineering that develops the product through successive levels of realization. The framework separates the concerns in software development by relating each level to a knowledge domain and localizing exactly on those qualities that become manifest in that knowledge domain. The basis of the framework is a mathematical model for reasoning about the correctness of realization schemes as well as the transformations between levels, so that each level preserves previously created qualities while adding new desired qualities. We also discuss some of the practical aspects of implementing this approach.
Conference Paper
Full-text available
Over 30 years ago, the preprocessor cpp was developed to extend the programming language C by lightweight metaprogramming capabilities. Despite its error-proneness and low abstraction level, the cpp is still widely being used in presentday software projects to implement variable software. However, not much is known about emphhow the cpp is employed to implement variability. To address this issue, we have analyzed forty open-source software projects written in C. Specifically, we answer the following questions: How does program size influence variability? How complex are extensions made via cpp's variability mechanisms? At which level of granularity are extensions applied? What is the general type of extensions? These questions revive earlier discussions on understanding and refactoring of the preprocessor. To answer them, we introduce several metrics measuring the variability, complexity, granularity, and type of extensions. Based on the data obtained, we suggest alternative implementation techniques. The data we have collected can influence other research areas, such as language design and tool support.
Article
The variability of configurable systems may lead to configuration-related issues (i.e., faults and warnings) that appear only when we select certain configuration options. Previous studies found that issues related to configurability are harder to detect than issues that appear in all configurations, because variability increases the complexity. However, little effort has been put into understanding configuration-related faults (e.g., undeclared functions and variables) and warnings (e.g., unused functions and variables). To better understand the peculiarities of configuration-related undeclared/unused variables and functions, in this paper we perform an empirical study of 15 systems to answer research questions related to how developers introduce these issues, the number of configuration options involved, and the time that these issues remain in source files. To make the analysis of several projects feasible, we propose a strategy that minimizes the initial setup problems of variability-aware tools. We detect and confirm 2 undeclared variables, 14 undeclared functions, 16 unused variables, and 7 unused functions related to configurability. We submit 30 patches to fix issues not fixed by developers. Our findings support the effectiveness of sampling (i.e., analysis of only a subset of valid configurations) because most issues involve two or less configuration options. Nevertheless, by analyzing the version history of the projects, we observe that a number of issues remain in the code for several years. Furthermore, the corpus of undeclared/unused variables and functions gathered is a valuable source to study these issues, compare sampling algorithms, and test and improve variability-aware tools.
Article
We believe that a C programmer's impulse to use #ifdef in an attempt at portability is usually a mistake. Portability is generally the result of advance planning rather than trench warfare involving #ifdef. In the course of developing C News on different systems, we evolved various tactics for dealing with differences among systems without producing a welter of #ifdefs at points of difference. We discuss the alternatives to, and occasional proper use of, #ifdef.
Conference Paper
Large software product lines need to manage complex variability. A common approach is variability modeling—creating and maintaining models that abstract over the variabilities inherent in such systems. While many variability modeling techniques and notations have been proposed, little is known about industrial practices and how industry values or criticizes this class of modeling. We attempt to address this gap with an exploratory case study of three companies that apply variability modeling. Among others, our study shows that variability models are valued for their capability to organize knowledge and to achieve an overview understanding of codebases. We observe centralized model governance, pragmatic versioning, and surprisingly little constraint modeling, indicating that the effort of declaring and maintaining constraints does not always pay off.
Conference Paper
While there has been active development of the Linux kernel, little has been done to address kernel bugs with gradually increasing lifetimes. From our statistical analysis, the average lifetime of kernel bugs in each kernel development cycle has increased 2.87 times from the years between 2008 and 2012. This indicates the instability of Linux kernels. To reduce bug lifetime, we present a Kernel Instant bug testing Service (KIS). KIS includes an infra-structure to collect kernel code commits, analyze the kernel with existing analysis tools and synthesize bug reports. KIS uses object caching, version merging, bisecting recall, and log filter optimizations to accelerate compilation and analysis. In the Linux kernel 3.7 development cycle, KIS used a small server farm to detect newly submitted kernel code commits from 61% active kernel git trees, and emailed hundreds of precise bug reports directly to responsible kernel developers within 1 hour on average, all without changing the kernel developers normal workflow.
Conference Paper
The C preprocessor is commonly used to implement variability in program families. Despite the widespread usage, some studies indicate that the C preprocessor makes variability implementation difficult and error-prone. However, we still lack studies to investigate preprocessor-based syntax errors and quantify to what extent they occur in practice. In this paper, we define a technique based on a variability-aware parser to find syntax errors in releases and commits of program families. To investigate these errors, we perform an empirical study where we use our technique in 41 program family releases, and more than 51 thousand commits of 8 program families. We find 7 and 20 syntax errors in releases and commits of program families, respectively. They are related not only to incomplete annotations, but also to complete ones. We submit 8 patches to fix errors that developers have not fixed yet, and they accept 75% of them. Our results reveal that the time developers need to fix the errors varies from days to years in family repositories. We detect errors even in releases of well-known and widely used program families, such as Bash, CVS and Vim. We also classify the syntax errors into 6 different categories. This classification may guide developers to avoid them during development.
Conference Paper
The Linux kernel is one of the largest configurable open source software systems implementing static variability. In Linux, variability is scattered over three different artifacts: source code files, Kconfig files, and Makefiles. Previous work detected inconsistencies between these artifacts that led to anomalies in the intended variability of Linux. We call these variability anomalies. However, there has been no work done to analyze how these variability anomalies are introduced in the first place, and how they get fixed. In this work, we provide an analysis of the causes and fixes of variability anomalies in Linux. We first perform an exploratory case study that uses an existing set of patches which solve variability anomalies to identify patterns for their causes. The observations we make from this dataset allow us to develop four research questions which we then answer in a confirmatory case study on the scope of the whole Linux kernel. We show that variability anomalies exist for several releases in the kernel before they get fixed, and that contrary to our initial suspicion, typos in feature names do not commonly cause these anomalies. Our results show that variability anomalies are often introduced through incomplete patches that change Kconfig definitions without properly propagating these changes to the rest of the system. Anomalies are then commonly fixed through changes to the code rather than to Kconfig files.
Conference Paper
Maintaining variation in software is a difficult problem that poses serious challenges for the understanding and editing of software artifacts. Although the C preprocessor (CPP) is often the default tool used to introduce variability to software, because of its simplicity and flexibility, it is infamous for its obtrusive syntax and has been blamed for reducing the comprehensibility and maintainability of software. In this paper, we address this problem by developing a prototype for managing software variation at the source code level. We evaluate the difference between our prototype and CPP with a user study, which indicates that the prototype helps users reason about variational code faster and more accurately than CPP. Our results also support the research of others, providing evidence for the effectiveness of related tools, such as CIDE and FeatureCommander.
Book
This textbook addresses students, professionals, lecturers and researchers interested in software product line engineering. With more than 100 examples and about 150 illustrations, the authors describe in detail the essential foundations, principles and techniques of software product line engineering. The authors are professionals and researchers who significantly influenced the software product line engineering paradigm and successfully applied software product line engineering principles in industry. They have structured this textbook around a comprehensive product line framework. Software product line engineering has proven to be the paradigm for developing a diversity of software products and software-intensive systems in shorter time, at lower cost, and with higher quality. It facilitates platform-based development and mass customisation. The authors elaborate on the two key principles behind software product line engineering: (1) the separation of software development in two distinct processes, domain a d application engineering; (2) the explicit definition and management of the variability of the product line across all development artefacts. As a student, you will find a detailed description of the key processes, their activities and underlying techniques for defining and managing software product line artefacts. As a researcher or lecturer, you will find a comprehensive discussion of the state of the art organised around the comprehensive framework. As a professional, you will find guidelines for introducing this paradigm in your company and an overview of industrial experiences with software product line engineering.
Conference Paper
Building software product lines (SPLs) with features is a challenging task. Many SPL implementations support features with coarse granularity - e.g., the ability to add and wrap entire methods. However, fine-grained extensions, like adding a statement in the middle of a method, either require intricate workarounds or obfuscate the base code with annotations. Though many SPLs can and have been implemented with the coarse granularity of existing approaches, fine-grained extensions are essential when extracting features from legacy applications. Furthermore, also some existing SPLs could benefit from fine-grained extensions to reduce code replication or improve readability. In this paper, we analyze the effects of feature granularity in SPLs and present a tool, called Colored IDE (CIDE), that allows features to implement coarse-grained and fine-grained extensions in a concise way. In two case studies, we show how CIDE simplifies SPL development compared to traditional approaches.
Conference Paper
We apply mathematical concept analysis to the problem of inferring configuration structures from existing source code. Concept analysis has been developed by German mathematicians over the last years; it can be seen as a discrete analogon to Fourier analysis. Based on this theory, our tool will accept source code, where configuration-specific statements are controlled by the preprocessor. The algorithm will compute a so-called concept lattice, which - when visually displayed - allows remarkable insight into the structure and properties of possible configurations. The lattice not only displays fine-grained dependencies between configuration threads, but also visualizes the overall quality of configuration structures according to software engineering principles. The paper presents a short introduction to concept analysis, as well as experimental results on various programs
Article
This is the first empirical study of the use of the C macro preprocessor, Cpp. To determine how the preprocessor is used in practice, this paper analyzes 26 packages comprising 1.4 million lines of publicly available C code. We determine the incidence of C preprocessor usage-whether in macro definitions, macro uses, or dependences upon macros-that is complex, potentially problematic, or inexpressible in terms of other C or C++ language features. We taxonomize these various aspects of preprocessor use and particularly note data that are material to the development of tools for C or C++, including translating from C to C++ to reduce preprocessor usage. Our results show that, while most Cpp usage follows fairly simple patterns, an effective program analysis tool must address the preprocessor. The intimate connection between the C programming language and Cpp, and Cpp's unstructured transformations of token streams often hinder both programmer understanding of C programs and tools built to engineer C programs, such as compilers, debuggers, call graph extractors, and translators. Most tools make no attempt to analyze macro usage, but simply preprocess their input, which results in a number of negative consequences; an analysis that takes Cpp into account is preferable, but building such tools requires an understanding of actual usage. Differences between the semantics of Cpp and those of C can lead to subtle bugs stemming from the use of the preprocessor, but there are no previous reports of the prevalence of such errors. Use of C++ can reduce some preprocessor usage, but such usage has not been previously measured. Our data and analyses shed light on these issues and others related to practical understanding or manipulation of real C programs. The results are of interest to language designers, tool writers, programmers, and software engineers.
Article
We apply mathematical concept analysis to the problem of infering configuration structures from existing source code. Concept analysis has been developed by German mathematicians over the last years; it can be seen as a discrete analogon to Fourier analysis. Based on this theory, our tool will accept source code, where configuration-specific statements are controlled by the preprocessor. The algorithm will compute a so-called concept lattice, which -- when visually displayed -- allows remarkable insight into the structure and properties of possible configurations. The lattice not only displays fine-grained dependencies between configuration threads, but also visualizes the overall quality of configuration structures according to software engineering principles. The paper presents a short introduction to concept analysis, as well as experimental results on various programs. 1 Introduction A simple and widely used technique for configuration management is the use of the C preprocessor....
Improving modular reasoning on preprocessor-based systems Linux variability anomalies: What causes them and how do they get fixed?
  • J Melo
  • P Borba
  • S Nadi
  • C Dietrich
  • R Tartler
  • R Holt
  • D Lohmann
J. Melo and P. Borba. Improving modular reasoning on preprocessor-based systems. In Software Components Architectures and Reuse (SBCARS), Seventh Brazilian Symposium on, pages 11–19, 2013. [15] S. Nadi, C. Dietrich, R. Tartler, R. Holt, and D. Lohmann. Linux variability anomalies: What causes them and how do they get fixed? In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on, pages 111–120, May 2013.
Improving modular reasoning on preprocessor-based systems
  • J Melo
  • P Borba
J. Melo and P. Borba. Improving modular reasoning on preprocessor-based systems. In Software Components Architectures and Reuse (SBCARS), Seventh Brazilian Symposium on, pages 11-19, 2013.
Usenix Association , 1992 . H. Spencer and G. Collyer. #ifdef considered harmful, or portability experience with C news
  • H Spencer
  • G Collyer
  • Spencer H.