Article

Code Churn: A Measure for Estimating the Impact of Code Change

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study presents a methodology that will produce a viable fault surrogate. The focus of the effort is on the precise measurement of software development process and product outcomes. Tools and processes for the static measurement of the source code have been installed and made operational in a large embedded software system. Source code measurements have been gathered unobtrusively for each build in the software evolution process. The measurements are synthesized to obtain the fault surrogate. The complexity of sequential builds is compared and a new measure, code churn, is calculated. This paper will demonstrate the effectiveness of code complexity churn by validating it against the testing problem reports. 1. Introduction As software systems changes over time, it is very difficult to understand, measure and predict the effect of changes. We would like to be able to describe numerically how each system differs from its successor or predecessor. Over a number of years worth of st...

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To answer RQ 4 , we present and discuss (i) the frequency of documented resolutions of Android performance issues within the whole dataset of Android apps, (ii) the distribution of documented resolutions of performance issues across the seven types of Android performance issues, and (iii) the distribution of the code churns associated to each of the 1,314 PRI-resolving commits across each type of Android performance issue. Code churns refer to the total number of changed lines of code in a commit (either added, removed or updated) (Munson and churn 1998). In this study, we rely on code churns because (i) it is one of the most used metrics for representing the change volume between two versions of the same system (Munson and churn 1998), (ii) it can be considered as a relatively good estimator for the development effort devoted to a GitHub commit, and (iii) it can be extracted automatically with low computational effort, and (iv) the git log command can compute it out of the box. ...
... Code churns refer to the total number of changed lines of code in a commit (either added, removed or updated) (Munson and churn 1998). In this study, we rely on code churns because (i) it is one of the most used metrics for representing the change volume between two versions of the same system (Munson and churn 1998), (ii) it can be considered as a relatively good estimator for the development effort devoted to a GitHub commit, and (iii) it can be extracted automatically with low computational effort, and (iv) the git log command can compute it out of the box. 24 Also, to better explain the results, we provide an example of solution for each type of Android performance issue. ...
Article
Full-text available
Mobile apps are playing a major role in our everyday life, and they are tending to become more and more complex and resource demanding. Because of that, performance issues may occur, disrupting the user experience or, even worse, preventing an effective use of the app. Ultimately, such problems can cause bad reviews and influence the app success. Developers deal with performance issues thorough dynamic analysis, i.e., performance testing and profiler tools, albeit static analysis tools can be a valid, relatively inexpensive complement for the early detection of some such issues. This paper empirically investigates how potential performance issues identified by a popular static analysis tool — Android Lint — are actually resolved in 316 open source Android apps among 724 apps we analyzed. More specifically, the study traces the issues detected by Android Lint since their introduction until they resolved, with the aim of studying (i) the overall evolution of performance issues in apps, (ii) the proportion of issues being resolved, as well as (iii) the distribution of their survival time, and (iv) the extent to which issue resolution are documented by developers in commit messages. Results indicate how some issues, especially related to the lack of resource recycle, tend to be more frequent than others. Also, while some issues, primarily of algorithmic nature, tend to be resolved quickly through well-known patterns, others tend to stay in the app longer, or not to be resolved at all. Finally, we found how only 10% of the issue resolution is documented in commit messages.
... Prioritization of the execution order of test cases in a test suite can improve test effectiveness [1] [11] [12] [13]. ...
... There can be many possible goals behind applying TCP, such as: to increase the rate of fault detection; to increase statement, branch or function test coverage; and/or to increase confidence in system reliability [12] [13]. To date, TCP has been primarily applied to improve regression testing efforts [11] [12] [13] of white box, code-level test cases. Our research goal is to develop and validate a systemlevel test case prioritization model based upon static metrics and system failure data to improve system test effectiveness. ...
Article
System testing is the last phase before the product is delivered for customer use and thus represents the last opportunity for verifying that the system functions correctly and as desired by customers. System test is time consuming in that it involves configuring and testing multiple complete, integrated systems (including hardware, operating system, and cooperating and co-existing applications) that are representative of a subset of customer environments. As a result, prioritizing the execution order of system test cases to maximize system test effectiveness would be beneficial. We are developing a statistical test case prioritization model that uses static metrics and system failure data with the goal of improving system test effectiveness.
... Indeed, the higher the number of relationships with other classes, the lower the ability of developers to consistently manage the complexity of a technical product [46] -Code Change Process. The way a source code component changes over time might impact its size and complexity [49], thus possibly decreasing the overall maintainability of a software project as well as increasing the efforts required at an organizational level. To account for this aspect, some metrics such as (i) number of lines of code added or modified in the class over different releases of a code component (a.k.a., code churn) and (ii) number of commits performed on such component over time should be carefully taken into account. ...
Chapter
DevOps predicates the continuity between Development and Operations teams at an unprecedented scale. Also, the continuity does not stop at tools, or processes but goes beyond into organizational practices, collaboration, co-located and coordinated effort. We conjecture that this unprecedented scale of continuity requires predictive analytics which are omniscient, that is (i) transversal to the technical, organizational, and social stratification in software processes and (ii) correlate all strata to provide a live and holistic snapshot of software development, its operations, and organization. Elaborating this conjecture, we illustrate a set of metrics to be used in the DevOps scenario and overview challenges and future research directions.
... In order to estimate the impact involved in maintaining the clients of a web service API, we start by using the code churn metric [18], which we define for each file as ...
Article
Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory qualitative study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. In order to complement the picture and also understand how web API providers deal with evolution, we investigate the server-side and client-side evolution of two open-source web APIs, namely VirtualBox and XBMC. Our study is complemented with a set of observations regarding best practices for web API evolution.
... IDPD research uses them as a measure because they are subjective and difficult to measure. In addition, code churn, which includes debug code churn, and refactoring code churn, is more associated with defect density than productivity [4] [5]. ...
Conference Paper
Full-text available
This research paper expands on a previously introduced phenomenon called Incremental Development Productivity Decline (IDPD) that is presumed to be present in all incremental software projects to some extent. Incremental models are now being used by many organizations in order to reduce development risks. Incremental development has become the most common method of software development. Therefore its characteristics inevitably influence the productivity of projects. Based on their observed IDPD, incrementally developed projects are split into several major IDPD categories. Different ways of measuring productivity are presented and evaluated in order to come to a definition or set of definitions that is suitable to these categories of projects. Data has been collected and analyzed, indicating the degree of IDPD associated with each category. Several hypotheses have undergone preliminary evaluations regarding the existence, stability and category-dependence of IDPD with encouraging results. Further data collection and hypothesis testing is underway.
... To understand the pervasiveness of logging, we study the density of log messages in source code and the churn rate [19] from the revision history. ...
Conference Paper
Software logging is a conventional programming practice. While its efficacy is often important for users and developers to understand what have happened in the production run, yet software logging is often done in an arbitrary manner. So far, there have been little study for understanding logging practices in real world software. This paper makes the first attempt (to the best of our knowledge) to provide a quantitative characteristic study of the current log messages within four pieces of large open-source software. First, we quantitatively show that software logging is pervasive. By examining developers' own modifications to the logging code in the revision history, we find that they often do not make the log messages right in their first attempts, and thus need to spend a significant amount of efforts to modify the log messages as after-thoughts. Our study further provides several interesting findings on where developers spend most of their efforts in modifying the log messages, which can give insights for programmers, tool developers, and language and compiler designers to improve the current logging practice. To demonstrate the benefit of our study, we built a simple checker based on one of our findings and effectively detected 138 pieces of new problematic logging code from studied software (24 of them are already confirmed and fixed by developers).
... Prior work used product metrics such as McCabe's cyclomatic complexity metric [33] and the Chidamber and Kemerer (CK) metrics suite [9] and code size (measured in lines of code) [1], [12], [15], [21], [29]. Other work uses process metrics to predict defect-prone locations [15], [19], [43], [44], [46]. Graves et al. [15] use process metrics based on the change history (e.g., number of past defects and number of developers) to build defect prediction models. ...
Article
Full-text available
Defect prediction models are a well-known technique for identifying defect-prone files or packages such that practitioners can allocate their quality assurance efforts (e.g., testing and code reviews). However, once the critical files or packages have been identified, developers still need to spend considerable time drilling down to the functions or even code snippets that should be reviewed or tested. This makes the approach too time consuming and impractical for large software systems. Instead, we consider defect prediction models that focus on identifying defect-prone (“risky”) software changes instead of files or packages. We refer to this type of quality assurance activity as “Just-In-Time Quality Assurance,” because developers can review and test these risky changes while they are still fresh in their minds (i.e., at check-in time). To build a change risk model, we use a wide range of factors based on the characteristics of a software change, such as the number of added lines, and developer experience. A large-scale study of six open source and five commercial projects from multiple domains shows that our models can predict whether or not a change will lead to a defect with an average accuracy of 68 percent and an average recall of 64 percent. Furthermore, when considering the effort needed to review changes, we find that using only 20 percent of the effort it would take to inspect all changes, we can identify 35 percent of all defect-inducing changes. Our findings indicate that “Just-In-Time Quality Assurance” may provide an effort-reducing way to focus on the most risky changes and thus reduce the costs of developing high-quality software.
... Moreover, impacts are not defined in a formal way. For the purpose of estimating the impact of code change, Munson and Elbaum introduced a code churn measure [21]. Kazman et al. used scenarios to study the impact of changes on a system's architecture, but did not define an impact model [12]. ...
Article
The assessment of the changeability of software systems is of major concern for buyers of the large systems found in fast-moving domains such as telecommunications. One way of approaching this problem is to investigate the dependency between the changeability of the software and its design, with the goal of finding design properties that can be used as changeability indicators. In our research, we defined a model of software changes and change impacts and implemented it for the C++ language. Furthermore, we identified a set of nine object-oriented (OO) design metrics, four of which are specifically geared towards changeability detection. The model and the metrics were applied to three test systems of industrial size. The experiment showed a high correlation, across systems and across changes, between changeability and the access to a class by other classes through method invocation or variable access. On the other hand, no result could support the hypothesis that the depth of the inheritance tree has some influence on changeability. Furthermore, our results confirm the observation of others that the use of inheritance is rather limited in industrial systems.
... Previous research [32] [42] has indicated that the output from static analysis tools can predict fault-and failure-prone components. Additionally, SLOCnormalized code churn has been used to point to problem areas in software [14] [33]. SLOC, although having disputed effects on fault density [10] [16], is a metric that is often available with a software system. ...
Article
Extensive research has shown that reliability models based upon software metrics can be used to predict which components are fault-and/or failure-prone early in the development process. In this research, we seek to parallel failure-prone component prediction with security models to predict which components are attack-prone. Security experts can use these models to make informed risk management decisions and to prioritize redesign, inspection, and testing efforts. We collected and analyzed data from a large commercial telecommunications software system containing over one million lines of code that had been deployed to the field for two years. Using recursive partitioning and logistic regression, we built attack-prone prediction models with the following metrics: static analysis tool output, code churn, source lines of code, failure reports from feature/system testing, and customer-reported failures. The models were validated against k-fold cross-validation and ROC curves. One model identified 100% of the attack-prone components with an 8% false positive rate.
... We use number of changes because each change represents an " exposure " of the developer to the code and because the previous measure of experience used by Mockus and Weiss also used the number of changes. However, prior literature [14] has shown high ...
Conference Paper
Full-text available
Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results.
... To assess the rework effort, we make use of the code churn metric, which is the amount of lines added, modified or deleted from one version to another within a module (Munson and Elbaum 1998). We chose to collect the data on a very granular level, i.e., per method. ...
Conference Paper
Full-text available
The Toyota Production System promotes "pull" scheduling to reduce the production of parts that do not comply to what the customer needs. The use of "pull" within software represents a radical change in the way activities are planned. This article gives two examples of the possible application of "pull" within software engineering and de-scribes a measurement tool to assess the current costs and amount of rework within a software development project. The described approach aims to help practitioners to under-stand whether to use "pull" or "push" in their organizations.
... We divided the analysis of the effects of God and Brain Classes into (1) the change frequency (CF and CF/LOC), (2) the change size (CS and CS/LOC, on the basis of code churn [22] ...
Conference Paper
Full-text available
Code smells are particular patterns in object-oriented systems that are perceived to lead to difficulties in the maintenance of such systems. It is held that to improve maintainability, code smells should be eliminated by refactoring. It is claimed that classes that are involved in certain code smells are liable to be changed more frequently and have more defects than other classes in the code. We investigated the extent to which this claim is true for God Classes and Brain Classes, with and without normalizing the effects with respect to the class size. We analyzed historical data from 7 to 10 years of the development of three open-source software systems. The results show that God and Brain Classes were changed more frequently and contained more defects than other kinds of class. However, when we normalized the measured effects with respect to size, then God and Brain Classes were less subject to change and had fewer defects than other classes. Hence, under the assumption that God and Brain Classes contain on average as much functionality per line of code as other classes, the presence of God and Brain Classes is not necessarily harmful; in fact, such classes may be an efficient way of organizing code.
... At the same time the study of prediction models provides cues for understanding the causes of errors, such as complex code change processes [16]. Past work in defect prediction makes extensive use of product and process measures that can be obtained from the source code of a software, such as code complexity metrics [18], change metrics [21] and inter-dependencies of elements in the code [23]. However, source-code is the end product of a variety of collaborative activities carried out by the developers of a software. ...
Conference Paper
Full-text available
Correcting software defects accounts for a significant amount of resources such as time, money and personnel. To be able to focus testing efforts where needed the most, researchers have studied statistical models to predict in which parts of a software future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on file-oriented measures, such as source code and churn metrics. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the repositories used to manage development efforts. In this paper, we investigate statistical models, to study the impact of social structures between developers and end-users on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of a large open-source software project. The results of our case study are promising and indicate that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our findings suggest that social information does not substitute, but rather augments traditional product and process-based metrics used in defect prediction models.
Article
Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper “Data Mining Static Code Attributes to Learn Defect Predictors” presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE’s most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.
Preprint
Full-text available
Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper ``Data Mining Static Code Attributes to Learn Defect Predictors'' presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE's most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.
Article
Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise. In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). We recommend this human+AI partnership, for several reasons. When a new domain is encountered, EMBLEM can learn better ways to label which comments refer to real problems. Also, in studies with 9 open source software projects, labelling via EMBLEM's incremental application of human+AI is at least an order of magnitude cheaper than existing methods ( \approx eight times). Further, EMBLEM is very effective. For the data sets explored here, EMBLEM better labelling methods significantly improved Popt20P_{opt}20 and G-scores performance in nearly all the projects studied here.
Conference Paper
A wide range of metrics have been used as features to build bug (or fault) predictors. However, most of the existing predictors focus mostly on object-oriented (OO) systems, either because they rely on OO metrics or were evaluated mainly with OO systems. Procedural software systems (PSS), less addressed in bug prediction research, often suffer from maintainability problems because they typically consist of low-level applications, using for example preprocessors to cope with variability. Previous work evaluated sets of features (composed of static code metrics) proposed in existing approaches in the PSS context. However, explored metrics are limited to those that are part of traditional metric suites, being often associated with structural code properties. A type of information explored to a smaller extent in this context is the output of code quality tools that statically analyse source code, providing hints of code problems. In this paper, we investigate the use of information collected from quality tools to build bug predictors dedicated to PSS. We specify four features derived from code quality tools or associated with poor programming practices and evaluate the effectiveness of these features. Our evaluation shows that our proposed features improve bug predictors in our investigated context.
Article
Full-text available
High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time-to-solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this paper we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multi-core architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e. OpenMP and OmpSs) and native implementations (i.e. Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in Lines-Of-Code (LOC) and Code Churn compared with Pthreads and comparable results with respect to other existing implementations.
Conference Paper
Predicting defect proneness of software products has been an active research area in software engineering domain in recent years. Researchers have been using static code metrics, code churn metrics, developer networks, and module networks as inputs to their proposed models until now. However, domain specific characteristics of software has not been taken into account. In this research, we propose to include a new set of metrics to improve defect prediction performance for web applications by utilizing their characteristics. To validate our hypotheses we used datasets from 3 open source web applications to conduct our experiments. Defect prediction is then performed using different machine learning algorithms. The results of experiments revealed that overall performance of defect predictors are improved compared to only using existing static code metrics. Therefore we recommend practitioners to utilise domain-specific characteristics in defect prediction as they can be informative.
Conference Paper
Full-text available
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks' statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the "Mythical Man Month" in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.
Article
Full-text available
Due to its effectiveness in fringe pattern analysis, a fast implementation of windowed Fourier transform (WFT)-based algorithms is desired. In this work, we speed up the implementation in two aspects. First, we speed up the computation of Fourier transform, which is a core component of the WFT-based algorithms. Second, we parallelize the main body of these algorithms due to its parallel characteristic. With these two approaches, we obtain a faster implementation than the original MATLAB version.
Article
Full-text available
Automated testing is a basic principle of agile development. Its benefits include early defect detection, defect causelocalization and removal of fear to apply changes to the code. Therefore, maintaining high quality test code is essential. This study introduces a model that assesses test code quality by combining source code metrics that reflect three main aspects of test codequality: completeness, effectiveness and maintainability. The model is inspired by the Software Quality Model of the SoftwareImprovement Group which aggregates source code metrics into quality ratings based on benchmarking. To validate the model we assess the relation between test code quality, as measured by the model, and issue handling performance. An experiment isconducted in which the test code quality model is applied to 18 open source systems. The test quality ratings are tested for correlation with issue handling indicators, which are obtained by mining issue repositories. In particular, we study the (1) defect resolution speed, (2) throughput and (3) productivity issue handling metrics. The results reveal a significant positive correlation between test code quality and two out of the three issue handling metrics (throughput and productivity), indicating that good test code quality positively influences issue handling performance.
Article
Full-text available
Test case prioritization techniques schedule test cases to reduce the cost of regression testing and to maximize some objective function. Test cases are prioritized such that those test cases which are more important under some criteria are executed earlier in regression testing process. The various objective functions are applicable as a metric of how rapidly faults are discovered during the testing process like rate of fault detection. Therefore, prioritization techniques are effective when implemented for specific instances. In this paper, a novel classification for test case prioritization is made which may cover every concept or measure and contribute for improvement of regression testing process. General Terms Regression testing: Test suites are saved so that they can be reused after the evolution of the software. This reuse of test suite is called the regression testing.
Article
Context Code ownership metrics were recently defined in order to distinguish major and minor contributors of a software module, and to assess whether the ownership of such a module is strong or shared between developers. Objective The relationship between these metrics and software quality was initially validated on proprietary software projects. Our objective in this paper is to evaluate such relationship in open-source software projects, and to compare these metrics to other code and process metrics. Method On a newly crafted dataset of seven open-source software projects, we perform, using inferential statistics, an analysis of code ownership metrics and their relationship with software quality. Results We confirm the existence of a relationship between code ownership and software quality, but the relative importance of ownership metrics in multiple linear regression models is low compared to metrics such as the number of lines of code, the number of modifications performed over the last release, or the number of developers of a module. Conclusion Although we do find a relationship between code ownership and software quality, the added value of ownership metrics compared to other metrics is still to be proven.
Conference Paper
Full-text available
Web APIs provide a systematic and extensible approach for application-to-application interaction. Developers using web APIs are forced to accompany the API providers in their software evolution tasks. In order to understand the distress caused by this imposition on web API client developers we perform a semi-structured interview with six such developers. We also investigate how major web API providers organize their API evolution, and we explore how this affects source code changes of their clients. Our exploratory study of the Twitter, Google Maps, Facebook and Netflix web APIs analyzes the state of web API evolution practices and provides insight into the impact of service evolution on client software. Our study is complemented with a set of observations regarding best practices for web API evolution.
Conference Paper
Quantitative metrics is a key input for improving software development processes. While developing applications using IEC 61131-3 languages, one of the major drawback is unavailability of product metrics. This paper discusses a methodology to define metrics for domain specific languages. Using this methodology, we have defined a set of product metrics that can be used for managing the software project development using IEC 61131-3 languages. We have defined size metrics for these languages using the language specific parameters, which will help the developers for better estimation and tracking the productivity.
Article
Full-text available
Context: Ownership metrics measure how the workload of software modules is shared among their developers. They have been shown to be accurate indicators of software quality. Objective: Since ownership metrics studies were done only on industrial software projects, we replicated such a study on Java free/libre and open source software (FLOSS) projects. Our goal was to generalize an "ownership law" that stated that minor developers should be avoided. Method: We explored the relationship between ownership metrics and fault-proneness on seven FLOSS projects, using publicly available corpora to retrieve the fault-related information. Results: In our corpus, the relationship between ownership metrics and module faults is weak. At best, less than half of projects exhibit a significant correlation, and at worst, no projects at all. Moreover, fault-proneness seems to be much more influenced by module size than by ownership. Conclusion: The results of ownership studies done on closed-source projects do not generalize to FLOSS projects. To understand the reasons for that, we performed an in-depth analysis and found that the lack of correlation between ownership metrics and module faults is due to the distributions of contributions among developers and the presence of "heroes" in FLOSS projects.
Conference Paper
Code review is the manual assessment of source code by humans, mainly intended to identify defects and quality problems. Modern Code Review (MCR), a lightweight variant of the code inspections investigated since the 1970s, prevails today both in industry and open-source software (OSS) systems. The objective of this paper is to increase our understanding of the practical benefits that the MCR process produces on reviewed source code. To that end, we empirically explore the problems fixed through MCR in OSS systems. We manually classified over 1,400 changes taking place in reviewed code from two OSS projects into a validated categorization scheme. Surprisingly, results show that the types of changes due to the MCR process in OSS are strikingly similar to those in the industry and academic systems from literature, featuring the similar 75:25 ratio of maintainability-related to functional problems. We also reveal that 7–35% of review comments are discarded and that 10–22% of the changes are not triggered by an explicit review comment. Patterns emerged in the review data; we investigated them revealing the technical factors that influence the number of changes due to the MCR process. We found that bug-fixing tasks lead to fewer changes and tasks with more altered files and a higher code churn have more changes. Contrary to intuition, the person of the reviewer had no impact on the number of changes.
Article
Deploying vulnerable software can be costly both in terms of patches and security breaches. Since software development primarily involves people, researchers are increasingly analyzing version control data to observe developer collaboration and contribution. We conducted a case study of the Linux kernel to evaluate a suite of developer activity metrics for the purpose of predicting security vulnerabilities. Our suite includes centrality and cluster metrics from network analysis of version control data. Our results support the hypothesis that source code files which have been developed by multiple clusters of developers are likely to be vulnerable. Furthermore, source code files are likely to be vulnerable when changed by many developers who themselves have made many changes to other files. Our results indicate that developer metrics predict vulnerabilities, but may be more likely to perform better in the presence of other code or process metrics.
Conference Paper
Full-text available
The phenomenon called Incremental Development Productivity Decline (IDPD) is presumed to be present in all incremental soft-ware projects to some extent. COCOMO II is a popular parametric cost estimation model that has not yet been adapted to account for the challenges that IDPD poses to cost estimation. Instead, its cost driver and scale factors stay constant throughout the increments of a project. While a simple response could be to make these parameters variable per increment, questions are raised as to whether the existing parameters are enough to predict the behavior of an incrementally developed project even in that case. Individual COCOMO II parameters are evaluated with regard to their development over the course of increments and how they influence IDPD. The reverse is also done. In light of data collected in recent experimental projects, additional new variable parameters that either extend COCOMO II or could stand on their own are proposed.
Article
In 2001, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired a number of development and research efforts on improving the reliability of driver code. Today Linux is used in a much wider range of environments, provides a much wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? Are drivers still a major problem? To answer these questions, we have transported the experiments of Chou et al. to Linux versions 2.6.0 to 2.6.33, released between late 2003 and early 2010. We find that Linux has more than doubled in size during this period, but that the number of faults per line of code has been decreasing. And, even though drivers still accounts for a large part of the kernel code and contains the most faults, its fault rate is now below that of other directories, such as arch (HAL) and fs (file systems). These results can guide further development and research efforts. To enable others to continually update these results as Linux evolves, we define our experimental protocol and make our checkers and results available in a public archive.
Article
A central part of software quality is finding bugs. One method of finding bugs is by measuring important aspects of the software product and the development process. In recent history, researchers have discovered evidence of a "code churn" effect whereby the degree to which a given source code file has changed over time is correlated with faults and vulnerabilities. Computing the code churn metric comes from counting source code differences in version control repositories. However, code churn does not take into account a critical factor of any software development team: the human factor, specifically who is making the changes. In this paper, we introduce a new class of human-centered metrics, "interactive churn metrics" as variants of code churn. Using the git blame tool, we identify the most recent developer who changed a given line of code in a file prior to a given revision. Then, for each line changed in a given revision, determined if the revision author was changing his or her own code ("self churn"), or the author was changing code last modified by somebody else ("interactive churn"). We derive and present several metrics from this concept. Finally, we conducted an empirical analysis of these metrics on the PHP programming language and its post-release vulnerabilities. We found that our interactive churn metrics are statistically correlated with post-release vulnerabilities and only weakly correlated with code churn metrics and source lines of code. The results indicate that interactive churn metrics are associated with software quality and are different from the code churn and source lines of code.
Conference Paper
Software architectures are heterogeneous, multiple-dimensional entities that aim to reduce the cost and ease the complexity associated with the development of large, complex software systems. To fully realize the advantages of software architectures, ...
Conference Paper
Security is a harsh reality for software teams today. Developers must engineer secure software by preventing vulnerabilities, which are design and coding mistakes that have security consequences. Even in open source projects, vulnerable source code can remain unnoticed for years. In this paper, we traced 68 vulnerabilities in the Apache HTTP server back to the version control commits that contributed the vulnerable code originally. We manually found 124 Vulnerability-Contributing Commits (VCCs), spanning 17 years. In this exploratory study, we analyzed these VCCs quantitatively and qualitatively with the over-arching question: "What could developers have looked for to identify security concerns in this commit?" Specifically, we examined the size of the commit via code churn metrics, the amount developers overwrite each others' code via interactive churn metrics, exposure time between VCC and fix, and dissemination of the VCC to the development community via release notes and voting mechanisms. Our results show that VCCs are large: more than twice as much code churn on average than non-VCCs, even when normalized against lines of code. Furthermore, a commit was twice as likely to be a VCC when the author was a new developer to the source code. The insight from this study can help developers understand how vulnerabilities originate in a system so that security-related mistakes can be prevented or caught in the future.
Conference Paper
The software development process patterns in open source software projects are not well known. Consequently, the longevity of new open source software projects is left up to subjective experiences of the development team. In this study, we are investigating a data mining approach for identifying relevant patterns in software development process. We demonstrate the capabilities of wavelet analysis on 27 open source software projects for identifying similar evolutionary patterns or events in different projects. The analysis identified close to 1000 evolutionary patterns common to multiple projects. The analysis of some of the patterns shows that the end of source code evolution of a project is determined early in the project. In addition, strong fluctuations of activity in sequential periods are identified as good indicators of problems in projects. In conclusion, the analysis reveals that wavelet analysis can be a powerful and objective tool for identifying evolutionary events that can be used as estimation basis or management guide in software projects.
Conference Paper
Predictive models that use machine learning techniques has been useful tools to guide software project managers in making decisions under uncertainty. However in practice collecting metrics or defect data has been a troublesome job and researchers often have to deal with incomplete datasets in their studies. As a result both researchers and practitioners shy away from implementing such models. Missing data is a common problem in other domains to build recommender systems. We believe that the techniques used to overcome missing data problem in other domains can also be employed in software engineering. In this paper we propose Matrix Factorization algorithm to tackle with missing data problem in building predictive models in software development domain.
Article
Full-text available
Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models.
Article
ContextSoftware metrics may be used in fault prediction models to improve software quality by predicting fault location.Objective This paper aims to identify software metrics and to assess their applicability in software fault prediction. We investigated the influence of context on metrics’ selection and performance.Method This systematic literature review includes 106 papers published between 1991 and 2011. The selected papers are classified according to metrics and context properties.ResultsObject-oriented metrics (49%) were used nearly twice as often compared to traditional source code metrics (27%) or process metrics (24%). Chidamber and Kemerer’s (CK) object-oriented metrics were most frequently used. According to the selected studies there are significant differences between the metrics used in fault prediction performance. Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics. Process metrics seem to be better at predicting post-release faults compared to any static code metrics.Conclusion More studies should be performed on large industrial software systems to find metrics more relevant for the industry and to answer the question as to which metrics should be used in a given context.
Thesis
Full-text available
The current state of mining software repository tools and technologies has provided opportunities for quantitative studies in software engineering. In this dissertation, these mined data are used to reconstruct the micro processes performed daily by developers (referred to as Micro Process Analysis (MPA)). We investigated how MPA complements the current software process improvement (SPI) initiatives. Unlike typical macro level SPI models, we demonstrated the application of MPA at the maintenance phase. Specifically, we targeted micro processes associated with bug & patch resolution and peer review. For bug and patch processes, we quantitatively re-established Lehman’s law between maintenance effort and code complexity. With three open source software (OSS) projects and a closed experiment, our proposed metrics proved this relationship to be statistically significant. For peer review processes, we developed two models to assist OSS members identifying their social standing and career trajectory. SPI is achieved by more efficient and higher quality reviews, through the identification of expertise. Providing a career trajectory model encourages member participation, thus it promotes the sustainability of peer reviews within a project. Our techniques and approaches validated the application of MPA for software maintenance. We concluded that micro processes could serve as supplements to macro processes, therefore providing an ‘added dimension’ to SPI.
Article
Full-text available
Nowadays, shared-memory parallel architectures have evolved and new programming frameworks have appeared that exploit these architectures: OpenMP, TBB, Cilk Plus, ArBB and OpenCL. This article focuses on the most extended of these frameworks in commercial and scientific areas. This paper shows a comparative study of these frameworks and an evaluation. The study covers several capacities, such as task deployment, scheduling techniques, or programming language abstractions. The evaluation measures three dimensions: code development complexity, performance and efficiency, measure as speedup per watt. For this evaluation, several parallel benchmarks have been implemented with each framework. These benchmarks are created to cover certain scenarios, like regular memory access or irregular computation. The conclusions show some highlights, like the fact that some frameworks (OpenMP, Cilk Plus) are better for transforming quickly a sequential code, others (TBB) have a small footprint which is ideal for small problems, and others (OpenCL) are suited for heterogeneous architectures but they require a very complex development process. The conclusions also show that the vectorization support is more critical than multitasking to achieve efficiency for those problems where this approach fits.
Article
In software evolution, stability is defined as the ability of a module to remain largely unchanged when faced with newer requirements and/or changes in the environment. Although stability is an important long-term design characteristic for hardware systems, it has not been studied deeply for software systems. Stability is directly related to software evolvability and maintainability; and it affects the software evolution process. A model based on software version differences is presented to measure the evolutionary stability of software modules. This model represents and normalizes two distances: the source code and the structure distances, between two versions of an evolving software module. As a case study based on this model, the evolutionary stability of Linux and FreeBSD modules is compared. The results of the study shows that the evolutionary stability of a software module in Linux and FreeBSD depends more on its function type and less on the system environment.
Article
Full-text available
This research is a longitudinal study of change processes. It links changes in the product line architecture of a large telecommunications equipment supplier with the company’s customers, inner context, and eight line card products over six-year period. There are three important time related constructs in this study: the time it takes to develop a new product line release; the frequency in which a metric is collected; and the frequency at which financial results and metrics related to the customer layer are collected and made available. Data collection has been organized by product release. The original goal of this research is to study the economic impact of market reposition on the product line and identify metrics that can be used to records changes in product line. We later look at the product line evolution vis-à-vis the changes in the products that form the product line. Our results show that there is no relationship between the size of the code added to the product line and the number of designers required to develop and test it; and there is a positive relationship between designer turnover and impact of change.
Conference Paper
Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any mod- els. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted. Categories and Subject Descriptors. D.2.8 (Software Engineer- ing): Metrics—Performance measures, Process metrics, Product metrics. D.2.9 (Software Engineering): Management—Software quality assurance (SQA)
Conference Paper
People are the most important pillar of software development process. It is critical to understand how they interact with each other and how these interactions affect the quality of the end product in terms of defects. In this research we propose to include a new set of metrics, a.k.a. social network metrics on issue repositories in predicting defects. Social network metrics on issue repositories has not been used before to predict defect proneness of a software product. To validate our hypotheses we used two datasets, development data of IBM1 Rational ® Team Concert™ (RTC) and Drupal, to conduct our experiments. The results of the experiments revealed that compared to other set of metrics such as churn metrics using social network metrics on issue repositories either considerably decreases high false alarm rates without compromising the detection rates or considerably increases low prediction rates without compromising low false alarm rates. Therefore we recommend practitioners to collect social network metrics on issue repositories since people related information is a strong indicator of past patterns in a given team.
Article
ContextSource code revision control systems contain vast amounts of data that can be exploited for various purposes. For example, the data can be used as a base for estimating future code maintenance effort in order to plan software maintenance activities. Previous work has extensively studied the use of metrics extracted from object-oriented source code to estimate future coding effort. In comparison, the use of other types of metrics for this purpose has received significantly less attention.Objective This paper applies machine learning techniques to unveil predictors of yearly cumulative code churn of software projects on the basis of metrics extracted from revision control systems.Method The study is based on a collection of object-oriented code metrics, XML code metrics, and organisational metrics. Several models are constructed with different subsets of these metrics. The predictive power of these models is analysed based on a dataset extracted from eight open-source projects.ResultsThe study shows that a code churn estimation model built purely with organisational metrics is superior to one built purely with code metrics. However, a combined model provides the highest predictive power.Conclusion The results suggest that code metrics in general, and XML metrics in particular, are complementary to organisational metrics for the purpose of estimating code churn.
Article
Full-text available
Many software systems are developed in a number of consecutive releases. In each release not only new code is added but also existing code is often modified. In this study we show that the modified code can be an important source of faults. Faults are widely recognized as one of the major cost drivers in software projects. Therefore, we look for methods that improve the fault detection in the modified code. We propose and evaluate a number of prediction models that increase the efficiency of fault detection. To build and evaluate our models we use data collected from two large telecommunication systems produced by Ericsson. We evaluate the performance of our models by applying them both to a different release of the system than the one they are built on and to a different system. The performance of our models is compared to the performance of the theoretical best model, a simple model based on size, as well as to analyzing the code in a random order (not using any model). We find that the use of our models provides a significant improvement over not using any model at all and over using a simple model based on the class size. The gain offered by our models corresponds to 38–57% of the theoretical maximum gain.
Article
Full-text available
Software reliability reflects a customer's view of the products we build and test, as it is usually measured in terms of failures experienced during regular system use. But our testing strategy is often based on early product measures, since we cannot measure failures until the software is placed in the field. The author shows us that such measurement is not effective at predicting the likely reliability of the delivered software.
Article
Full-text available
For 25 years, software researchers have proposed improving software development and maintenance with new practices whose effectiveness is rarely, if ever, backed up by hard evidence. We suggest several ways to address the problem, and we challenge the community to invest in being more scientific.< >
Article
As our understanding of the software development process improves, it is becoming clear that embedded faults are not placed in the software by random processes known only to nature. The location of these faults is closely related to measurable software complexity attributes. This paper describes the processes whereby software complexity metrics may be used to identify regions of software that are fault prone. This information is then exploited to develop a model of dynamic program complexity for the identification of failure prone software. Based on this new look at dynamic program complexity and software failures three areas of interest in software reliability will be examined. First, a new statistical testing methodology will be defined. The specific objectives for software test under statistical testing procedures will be defined. Second, the specific characteristics of software for use in software reliability modeling will be discussed. The use of dynamic complexity and failure models will be incorporated into a software reliability model. Finally, once the characteristics of software faults and failures are modeled, two degrees of freedom in the software design process are identified representing an initial step in the mathematical specification of software design objectives. Supporting current research experiences from an ongoing research project on the Space Shuttle Primary Avionics Software System will be discussed.
Article
Predictive models that incorporate a functional relationship of program error measures with software complexity metrics and metrics based on factor analysis of empirical data are developed. Specific techniques for assessing regression models are presented for analyzing these models. Within the framework of regression analysis, the authors examine two separate means of exploring the connection between complexity and errors. First, the regression models are formed from the raw complexity metrics. Essentially, these models confirm a known relationship between program lines of code and program errors. The second methodology involves the regression of complexity factor measures and measures of errors. These complexity factors are orthogonal measures of complexity from an underlying complexity domain model. From this more global perspective, it is believed that there is a relationship between program errors and complexity domains of program structure and size (volume). Further, the strength of this relationship suggests that predictive models are indeed possible for the determination of program errors from these orthogonal complexity domains
Article
Software measurement is a key element in the evolving software engineering discipline. This paper seeks to identify some of the principal problems surrounding the measurement process and the metrics themselves. Metrics are shown in various modeling contexts. Measures of static software complexity are explored together with measures of dynamic program complexity. These data are then used to identify regions of software that are fault prone. This information is then exploited to develop a model of dynamic program complexity for the identification of failure prone software. Empirical experiences with the practice of measurement from an ongoing research project on the Space Shuttle Primary Avionics Software System are employed to augment the discussion.
Article
Predicting the number of faults is not always necessary to guide quality development; it may be enough to identify the most troublesome modules. Predicting the quality of modules lets developers focus on potential problems and make improvements earlier in development, when it is more cost-effective. In such cases, classification models rather than regression models work very well.As a case study, this article applies discriminant analysis to identify fault-prone modules in a sample representing about 1.3 million lines of code from a very large telecommunications system. We developed two models using design product metrics based on call graphs and control flow graphs. One model used only these metrics; the other included reuse information as well. Both models had excellent fit. However, the model that included reuse data had substantially better predictive accuracy. We thus learned that information about reuse can be a significant input to software quality models for improving the accuracy of predictions.
Conference Paper
A major portion of our recent research is to refine a software process model for a major Storage Technology Corporation (STK) development program. The work is centered around the precise measurement of software development outcomes for the accurate reliability assessment of the program and subsequent projects. Three major measurement initiatives were established. First, tools and processes for the static measurement of the source code have been installed and made operational at STK. The measurement process has been automated and been made transparent to the programming staff. Second, tools and processes have been developed for the management and accurate measurement of software faults. No useful software measurement system may be developed without a precise understanding of software faults and their etiology. Finally, tools have been developed and processes initiated that will demonstrate the feasibility and the necessity of a dynamic software measurement processes. The STK program includes the development of a software profiler to obtain measures of program dynamics
Conference Paper
Explores problems in the measurement of evolving software systems. As these systems change through successive builds, the complexity characteristics of the individual modules that make up the system also change. A methodology is presented that extends the notion of software complexity domains across sequential builds. Changes to software systems are then measured on these attribute domains to provide leading indicators of potential problems introduced by the changes. Also, the notion of establishing a measurement baseline is presented. This permits the comparison of a sequence of successive software builds with one another. A specific software measurement example is presented using measurement data from the Space Shuttle Primary Avionics Software System
Article
Predicting the number of faults is not always necessary to guide quality development; it may be enough to identify the most troublesome modules. Predicting the quality of modules lets developers focus on potential problems and make improvements earlier in development, when it is more cost-effective. In such cases, classification models rather than regression models work very well. As a case study, this article applies discriminant analysis to identify fault-prone modules in a sample representing about 1.3 million lines of code from a very large telecommunications system. We developed two models using design product metrics based on call graphs and control-flow graphs. One model used only these metrics; the other included reuse information as well. Both models had excellent fit. However, the model that included reuse data had substantially better predictive accuracy. We thus learned that information about reuse can be a significant input to software quality models for improving the accuracy of predictions.
Article
The use of the statistical technique of discriminant analysis as a tool for the detection of fault-prone programs is explored. A principal-components procedure was employed to reduce simple multicollinear complexity metrics to uncorrelated measures on orthogonal complexity domains. These uncorrelated measures were then used to classify programs into alternate groups, depending on the metric values of the program. The criterion variable for group determination was a quality measure of faults or changes made to the programs. The discriminant analysis was conducted on two distinct data sets from large commercial systems. The basic discriminant model was constructed from deliberately biased data to magnify differences in metric values between the discriminant groups. The technique was successful in classifying programs with a relatively low error rate. While the use of linear regression models has produced models of limited value, this procedure shows great promise for use in the detection of program modules with potential for faults
A Measure of Software System Complexity and Its Relationship to Faults
  • T M Khoshgoftaar
  • J C Munson
T. M. Khoshgoftaar and J. C. Munson "A Measure of Software System Complexity and Its Relationship to Faults," In Proceedings of the 1992 International Simulation Technology Conference, The Society for Computer Simulation, San Diego, CA, 1992, pp. 267-272.
The Relative Software Complexity Metric: A Validation Study
  • J C Munson
  • T M Khoshgoftaar
J. C. Munson and T. M. Khoshgoftaar, "The Relative Software Complexity Metric: A Validation Study," In Proceedings of the Software Engineering 1990