Article
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper, we analyzed a recent work by Wiese et al. [3] that proposes co-change prediction models for source files using contextual information from version control systems and issue repositories, and chose it as our baseline empirical study. We conceptually replicated the baseline in [3] by extracting more than 10 years of development data of two projects, Apache Derby and Apache CXF, and calculating commit and developer related features. ...
... In this paper, we analyzed a recent work by Wiese et al. [3] that proposes co-change prediction models for source files using contextual information from version control systems and issue repositories, and chose it as our baseline empirical study. We conceptually replicated the baseline in [3] by extracting more than 10 years of development data of two projects, Apache Derby and Apache CXF, and calculating commit and developer related features. We then built co-change prediction models for a subset of files based on their ranking over commits, and associated co-changed files. ...
... We then built co-change prediction models for a subset of files based on their ranking over commits, and associated co-changed files. Our study differs from the original study [3] in terms of experimental protocol in selecting co-changed file pairs, model construction, and performance evaluation. ...
... Hence, a co-change is the change made in a software entity as a result of another change made on another entity to avoid defects and poor performance. Undetected co-change may raise the cost and time consumed while performing software maintenance [3]. ...
... In some cases, co-changes can be detected by analyzing the software system structure, to reveal the directly related software entities, e.g., when Class X uses an instance of Class Y [3] [4]. In other cases, there is no direct relationship among software entities. ...
... Wiese et al [3] has proposed a prediction model for each pair of software entities, based on relevant association rules. Those rules are produced from the contextual information that extracted from issues tracking systems, developers' communications, and commits metadata. ...
Conference Paper
The failure in propagating software changes properly during the maintenance process is one of the main causes of defects and poor software performance. It also increases the time consumed while searching for related changes manually. In addition, incomplete changes increase the cost of the maintenance process, by hiring highly paid senior developers, to give consultations for maintaining the software systems. In this paper, we present an approach called Change Propagation Path (CPP), which is a data mining method that aims at helping developers to predict software complementary changes and perform changes correctly. The CPP approach employs the frequent pattern analysis technique to be used on historical data stored within software repositories. We have designed a web-based tool called Wide Assisting and Leading (WALead) and conducted an experimental study as a proof of concept and to validate the proposed approach. The WALead tool was designed to support developers remotely and through any platform. The tool has been tested in terms of its effects on the maintenance process, and to prove the feasibility of the CPP approach.
... Out of these 3 transactions, 2 also included changes to the class C. Therefore, the support for the logical dependency A → C will be 2. By its own nature, support is a symmetric metric, so the A → C dependency also implies A ← C. The support value of a given rule determines how evident the rule is Wiese et al. (2017). ...
... In other words, the confidence is directional, and determines the strength of the consequence of a given (directional) logical dependency. The confidence value is the strength of a given association rule (Wiese et al. 2017). ...
Article
Full-text available
Software systems continuously evolve to accommodate new features and interoperability relationships between artifacts point to increasingly relevant software change impacts. During maintenance, developers must ensure that related entities are updated to be consistent with these changes. Studies in the static change impact analysis domain have identified that a combination of source code and lexical information outperforms using each one when adopted independently. However, the extraction of lexical information and the measure of how loosely or closely related two software artifacts are, considering the semantic information embedded in their comments and identifiers has been carried out using somewhat complex information retrieval (IR) techniques. The interplay between software semantic and change relationship strengths has also not been extensively studied. This work aims to fill both gaps by comparing the effectiveness of measuring semantic coupling of OO software classes using (i) simple identifier based techniques and (ii) the word corpora of the entire classes in a software system. Afterwards, we empirically investigate the interplay between semantic and change coupling. The empirical results show that: (1) identifier based methods have more computational efficiency but cannot always be used interchangeably with corpora-based methods of computing semantic coupling of classes and (2) there is no correlation between semantic and change coupling. Furthermore we found that (3) there is a directional relationship between the two, as over 70% of the semantic dependencies are also linked by change coupling but not vice versa.
... On the other hand, the API-domain labels for "Latex" and "Open Office Document" appeared only in five java files, while "Security" appears in only six files. Future research can investigate co-occurrence prediction techniques (e.g., [56]) in this context. ...
Preprint
Full-text available
Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
... On the other hand, the API-domain labels for "Latex" and "Open Office Document" appeared only in five java files, while "Security" appears in only six files. Future research can investigate co-occurrence prediction techniques (e.g., [56]) in this context. ...
Conference Paper
Full-text available
Abstract—Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues’ description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels’ relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecturebased labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
... On the other hand, the API-domain labels for "Latex" and "Open Office Document" appeared only in five java files, while "Security" appears in only six files. Future research can investigate co-occurrence prediction techniques (e.g., [55]) in this context. ...
Preprint
Full-text available
Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82\% and recall up to 97.8\%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
... In previous work (Wiese et al. 2015(Wiese et al. , 2016, we conducted an exploratory study with two projects by using random forest classifiers trained with contextual information from past changes to improve the co-change prediction. We relied on the concept of change coupling to select the pairs of files most likely to co-change and to compare our prediction model to an association rule model (Oliva and Gerosa 2015b). ...
Article
Full-text available
Models that predict software artifact co-changes have been proposed to assist developers in altering a software system and they often rely on coupling. However, developers have not yet widely adopted these approaches, presumably because of the high number of false recommendations. In this work, we conjecture that the contextual information related to software changes, which is collected from issues (e.g., issue type and reporter), developers’ communication (e.g., number of issue comments, issue discussants and words in the discussion), and commit metadata (e.g., number of lines added, removed, and modified), improves the accuracy of co-change prediction. We built customized prediction models for each co-change and evaluated the approach on 129 releases from a curated set of 10 Apache Software Foundation projects. Comparing our approach with the widely used association rules as a baseline, we found that contextual information models and association rules provide a similar number of co-change recommendations, but our models achieved a significantly higher F-measure. In particular, we found that contextual information significantly reduces the number of false recommendations compared to the baseline model. We conclude that contextual information is an important source for supporting change prediction and may be used to warn developers when they are about to miss relevant artifacts while performing a software change.
... Antoniol et al. built a classifier using machine learning techniques to classify issues into two classes: bugs and non bugs [4]. Wiese et al. used issues as contextual data to improve the co-change prediction model, i.e., a model to help developers aware of artifacts that will change together with the artifact they are working on, of software systems [39]. Weiss et al. employed a nearest neighbors technique to automatically predict the fixing efforts of issues to facilitate issue assignment and maintenance scheduling [38]. ...
Conference Paper
In a software system's development lifecycle, engineers make numerous design decisions that subsequently cause architectural change in the system. Previous studies have shown that, more often than not, these architectural changes are unintentional by-products of continual software maintenance tasks. The result of inadvertent architectural changes is accumulation of technical debt and deterioration of software quality. Despite their important implications, there is a relative shortage of techniques, tools, and empirical studies pertaining to architectural design decisions. In this paper, we take a step toward addressing that scarcity by using the information in the issue and code repositories of open-source software systems to investigate the cause and frequency of such architectural design decisions. Furthermore, building on these results, we develop a predictive model that is able to identify the architectural significance of newly submitted issues, thereby helping engineers to prevent the adverse effects of architectural decay. The results of this study are based on the analysis of 21,062 issues affecting 301 versions of 5 large open-source systems for which the code changes and issues were publicly accessible.
Article
Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging and software maintenance. This paper is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results. Based on our findings, we also propose a set of current challenges that still need to be addressed in future work and provide a proposed research road map for these opportunities.
Article
Full-text available
Software is a very important part to implement advanced information systems, such as AI and IoT based on the latest hardware equipment of the fourth Industrial Revolution. In particular, decision making for software upgrade is one of the essential processes that can solve problems for upgrading the information systems. However, most of the decision-making studies for this purpose have been conducted only from the perspective of the IT professional and management position. Moreover, software upgrade can be influenced by various layers of decision makers, so further research is needed. Therefore, it is necessary to conduct research on what factors are required and affect the decision making of software upgrade at various layers of organization. For this purpose, decision factors of software upgrade are identified by literature review in this study. Additionally, the priority, degree of influence and relationship between the factors are analyzed by using the AHP and DEMATEL techniques at the organizational level of users, managers and IT professionals. The results show that the priority, weight value, causal relationship of decision factors of users, managers and IT professionals who constitute the organizational level were very different. The managers first considered the benefits, such as ROI, for organization as a leader. The users tended to consider their work efficiency and changes due to the software upgrade first. Finally, the IT professionals considered ROI, budget and compatibility for the aspect of the managers and users. Therefore, the related information of each organizational level can be presented more clearly for the systematic and symmetrical decision making of software upgrade based on the results of this study.
Article
Full-text available
The recent notable emergence of a body of research in requirements management on one hand and benefits realisation has contributed to addressing a growing need for improved performance in Architecture, Engineering and Construction (AEC) projects. However, front end design (FED) as one of the vital processes in the project life cycle and delivery has attracted limited research to date within this understanding. This paper aims to map current evidence on requirements management in facilitating benefits realisation from an FED perspective. This is to bring about an updated and unified position on requirements management for its impact on design decision making. A systematic review of the literature covering the last ten years (2008-2018) aims first to build understanding and support identification of these emergent conceptual positions and secondly underscore essential requirements and their categorisations that impact on design discourse in FED. One hundred sixty-one peer-reviewed journal papers in the areas of benefits realisation and/or requirements management and/or FED based are identified on a predetermined inclusion and exclusion criteria. Thirty-six requirements are identified as important in influencing use case changes important in design decision making broadly grouped into nine major categories. Following analysis, this research finds little evidence supporting an integrated requirements management practice and understanding to support design decision making. The research further finds bias in current research discourse towards four requirements categories (technical, economics, governance and environment); and 14 requirements, dominated by three strategic values, collaboration and project governance, with over 80% share of literature. The least 14 requirements such as "flow of spaces, social status/aspiration, mobility and integrated design" among others only account for less than 10% of literature. The authors argue for new research to bridge this gap, highlight the essential role of requirements management and broaden understanding to improve benefits realisation, particularly for FED processes.
Conference Paper
Full-text available
Community-based Open Source Software (OSS) projects are usually self-organized and dynamic, receiving contributions from distributed volunteers. Newcomer are important to the survival, long-term success, and continuity of these communities. However, newcomers face many barriers when making their first contribution to an OSS project, leading in many cases to dropouts. Therefore, a major challenge for OSS projects is to provide ways to support newcomers during their first contribution. In this paper, we propose and evaluate FLOSScoach, a portal created to support newcomers to OSS projects. FLOSScoach was designed based on a conceptual model of barriers created in our previous work. To evaluate the portal, we conducted a study with 65 students, relying on qualitative data from diaries, self-efficacy questionnaires, and the Technology Acceptance Model. The results indicate that FLOSScoach played an important role in guiding newcomers and in lowering barriers related to the orientation and contribution process, whereas it was not effective in lowering technical barriers. We also found that FLOSScoach is useful, easy to use, and increased newcomers' confidence to contribute. Our results can help project maintainers on deciding the points that need more attention in order to help OSS project newcomers overcome entry barriers.
Article
Full-text available
As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.
Article
Full-text available
While mining version control systems, researchers noticed that some artifacts frequently change together throughout software development. When a certain artifact co-changes repeatedly with another, we say that the former is change coupledto the latter. Researchers have found a series of applications for change coupling in software engineering. For instance, building on the idea that artifacts that changed together in the past are likely to change together in the future, researchers developed effective change prediction mechanisms. In this chapter, we describe the concept of change coupling in more detail and present different techniques to detect it. In particular, we provide ready-to-use code you can leverage as a starting step to detect change couplings in your own projects. In the last part of the chapter, we describe some of the main applications of change coupling analysis.
Article
Full-text available
Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.
Article
Full-text available
Context: Previous work that used prediction models on Software Engineering included few social metrics as predictors, even though many researchers argue that Software Engineering is a social activity. Even when social metrics were considered, they were classified as part of other dimensions, such as process, history, or change. Moreover, few papers report the individual effects of social metrics. Thus, it is not clear yet which social metrics are used in prediction models and what are the results of their use in different contexts. Objective: To identify, characterize, and classify social metrics included in prediction models reported in the literature. Method: We conducted a mapping study (MS) using a snowballing citation analysis. We built an initial seed list adapting strings of two previous systematic reviews on software prediction models. After that, we conducted backward and forward citation analysis using the initial seed list. Finally, we visited the profile of each distinct author identified in the previous steps and contacted each author that published more than 2 papers to ask for additional candidate studies. Results: We identified 48 primary studies and 51 social metrics. We organized the metrics into nine categories, which were divided into three groups - communication, project, and commit-related. We also mapped the applications of each group of metrics, indicating their positive or negative effects. Conclusions: This mapping may support researchers and practitioners to build their prediction models considering more social metrics.
Conference Paper
Full-text available
Code-based metrics such as coupling and cohesion are used to measure a system's structural complexity. But dealing with large systems-those consisting of several millions of lines- at the code level faces many problems. An alternative approach is to concentrate on the system's building blocks such as programs or modules as the unit of examination. We present an approach that uses information in a release history of a system to uncover logical dependencies and change patterns among modules. We have developed the approach by working with 20 releases of a large Telecommunications Switching System. We use release information such as version numbers of programs, modules, and subsystems together with change reports to discover common change behavior (i.e. change patterns) of modules. Our approach identifies logical coupling among modules in such a way that potential structural shortcomings can be identified and further examined, pointing to restructuring or reengineering opportunities.
Conference Paper
Full-text available
Software evolves with continuous source-code changes. These code changes usually need to be understood by software engineers when performing their daily development and maintenance tasks. However, despite its high importance, such change-understanding practice has not been systematically studied. Such lack of empirical knowledge hinders attempts to evaluate this fundamental practice and improve the corresponding tool support. To address this issue, in this paper, we present a large-scale quantitative and qualitative study at Microsoft. The study investigates the role of understanding code changes during software-development process, explores engineers' information needs for understanding changes and their requirements for the corresponding tool support. The study results reinforce our beliefs that understanding code changes is an indispensable task performed by engineers in software-development process. A number of insufficiencies in the current practice also emerge from the study results. For example, it is difficult to acquire important information needs such as a change's completeness, consistency, and especially the risk imposed by it on other software components. In addition, for understanding a composite change, it is valuable to decompose it into sub-changes that are aligned with individual development issues; however, currently such decomposition lacks tool support.
Conference Paper
Full-text available
Design degradation has long been assessed by means of structural analyses applied on successive versions of a software system. More recently, repository mining techniques have been developed in order to uncover rich historical information of software projects. In this paper, we leverage such information and propose an approach to assess design degradation that is programming language agnostic and relies almost exclusively on commit metadata. Our approach currently focuses on the assessment of two particular design smells: rigidity and fragility. Rigidity refer to designs that are difficult to change due to ripple effects and fragility refer to designs that tend to break in different areas every time a change is performed. We conducted an evaluation of our approach in the project Apache Maven 1 and the results indicated that our approach is feasible and that the project suffered from increasing fragility.
Conference Paper
Full-text available
Modelling and understanding bugs has been the focus of much of the Software Engineering research today. However, organizations are interested in more than just bugs. In particular, they are more concerned about managing risk, i.e., the likelihood that a code or design change will cause a negative impact on their products and processes, regardless of whether or not it introduces a bug. In this paper, we conduct a year-long study involving more than 450 developers of a large enterprise, spanning more than 60 teams, to better understand risky changes, i.e., changes for which developers believe that additional attention is needed in the form of careful code or design reviewing and/or more testing. Our findings show that different developers and different teams have their own criteria for determining risky changes. Using factors extracted from the changes and the history of the files modified by the changes, we are able to accurately identify risky changes with a recall of more than 67%, and a precision improvement of 87% (using developer specific models) and 37% (using team specific models), over a random model. We find that the number of lines and chunks of code added by the change, the bugginess of the files being changed, the number of bug reports linked to a change and the developer experience are the best indicators of change risk. In addition, we find that when a change has many related changes, the reliability of developers in marking risky changes is negatively affected. Our findings and models are being used today in practice to manage the risk of software projects.
Article
Full-text available
Previous studies have shown that social factors of software engineering influence software quality. Communication and development networks represent the interactions among software developers. We explored the statistical relationships between file change proneness and a set metrics extracted from the issue tracker and version control system data to find the relative importance of each metric in understanding the evolution of file changes in the Rails project. Using hierarchical analysis, we found that code churn, number of past changes, and number of developers explain the evolution of changes in the Rails project better than Social Network Analysis (SNA) metrics. Considering the relative importance of each predictor, we got the same results. We also conducted a factor analysis and found that social metrics contribute to explain a group of files different from those explained by process metrics.
Conference Paper
Full-text available
For a large and evolving software system, the project team could receive many bug reports over a long period of time. It is important to achieve a quantitative understanding of bug-fixing time. The ability to predict bug-fixing time can help a project team better estimate software maintenance efforts and better manage software projects. In this paper, we perform an empirical study of bug-fixing time for three CA Technologies projects. We propose a Markov-based method for predicting the number of bugs that will be fixed in future. For a given number of defects, we propose a method for estimating the total amount of time required to fix them based on the empirical distribution of bug-fixing time derived from historical data. For a given bug report, we can also construct a classification model to predict slow or quick fix (e.g., below or above a time threshold). We evaluate our methods using real maintenance data from three CA Technologies projects. The results show that the proposed methods are effective.
Conference Paper
Full-text available
Receiver Operating Characterics curves have recently received considerable attention as mechanisms for comparing different algorithms or experimental results. However the common approach of comparing Area Under the Curve has come in for some criticism with alternatives such as Area Under Kappa and H-measure being proposed as alternative measure. However, these measures have their own idiosyncracies and neglect certain advantages of RoC analysis that do not carry over to the proposed approach. In addition they suffer from the general desire for a one fits all mentality as opposed to a pareto approach. That is we want a single number to optimize, rather than a suite of numbers to trade off. The starting point for all of this research is the inadequacy and bias of traditional measures such as Accuracy, Recall and Precision, and F-measure - these should never be used singly or as a group, and ROC analysis is a very good alternative to them if used correctly, but treating it as a graph rather than trying to boil it down to a single number. Other measures that seek to remove the bias in traditional measures include Krippendorf's Alpha, Scott's Pi, Powers' Informedness and Markedness, as well as a great many variant Kappa statistics. The original Kappa statistics were intrinsically dichotomous but the family has been well generalized to allow for multiple classes and multiple sources of labels. We discuss the proper and improper use of ROC curves, the issues with AUC and Kappa, and make a recommendation as to the approach to take in comparing experimental algorithms or other kinds of tests.
Conference Paper
Full-text available
Defect prediction techniques could potentially help us to focus quality-assurance efforts on the most defect-prone files. Modern statistical tools make it very easy to quickly build and deploy prediction models. Software metrics are at the heart of prediction models; understanding how and especially why different types of metrics are effective is very important for successful model deployment. In this paper we analyze the applicability and efficacy of process and code metrics from several different perspectives. We build many prediction models across 85 releases of 12 large open source projects to address the performance, stability, portability and stasis of different sets of metrics. Our results suggest that code metrics, despite widespread use in the defect prediction literature, are generally less useful than process metrics for prediction. Second, we find that code metrics have high stasis; they don't change very much from release to release. This leads to stagnation in the prediction models, leading to the same files being repeatedly predicted as defective; unfortunately, these recurringly defective files turn out to be comparatively less defect-dense.
Article
Full-text available
Correcting software defects accounts for a significant amount of resources in a software project. To make best use of testing efforts, researchers have studied statistical models to predict in which parts of a software system future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on source code-oriented metrics, such as lines of code or number of changes. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the various repositories used to manage development efforts. In this paper, we develop statistical models to study the impact of social interactions in a software project on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of two large open-source software projects. The results of our case studies demonstrate the impact of metrics from four different dimensions of social interaction on post-release defects. Our findings show that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our results demonstrate that social information does not substitute, but rather augments traditional source code-based metrics used in defect prediction models.
Article
Full-text available
Context Software systems continuously change for various reasons, such as adding new features, fixing bugs, or refactoring. Changes may either increase the source code complexity and disorganization, or help to reducing it. Aim This paper empirically investigates the relationship of source code complexity and disorganization—measured using source code change entropy—with four factors, namely the presence of refactoring activities, the number of developers working on a source code file, the participation of classes in design patterns, and the different kinds of changes occurring on the system, classified in terms of their topics extracted from commit notes. Method We carried out an exploratory study on an interval of the life-time span of four open source systems, namely ArgoUML, Eclipse-JDT, Mozilla, and Samba, with the aim of analyzing the relationship between the source code change entropy and four factors: refactoring activities, number of contributors for a file, participation of classes in design patterns, and change topics. Results The study shows that (i) the change entropy decreases after refactoring, (ii) files changed by a higher number of developers tend to exhibit a higher change entropy than others, (iii) classes participating in certain design patterns exhibit a higher change entropy than others, and (iv) changes related to different topics exhibit different change entropy, for example bug fixings exhibit a limited change entropy while changes introducing new features exhibit a high change entropy. Conclusions Results provided in this paper indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy. Our findings contribute to understand the software aging phenomenon and are preliminary to identifying better ways to contrast it.
Article
Full-text available
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor. KeywordsDefect prediction–Source code metrics–Change metrics
Conference Paper
Full-text available
When changing a source code entity (e.g., a function), developers must ensure that the change is propagated to related entities to avoid the introduction of bugs. Accurate change propagation is essential for the successful evolution of complex software systems. Techniques and tools are needed to support developers in propagating changes. Several heuristics have been proposed in the past for change propagation. Research shows that heuristics based on the change history of a project outperform heuristics based on the dependency graph. However, these heuristics being static are not the answer to the dynamic nature of software projects. These heuristics need to adapt to the dynamic nature of software projects and must adjust themselves for the peculiarities of each changed entity. In this paper we propose adaptive change propagation heuristics. These heuristics are metaheuristics that combine various previously researched heuristics to improve the overall performance (precision and recall) of change propagation heuristics. Through an empirical case study, using four large open source systems; GCC (a compiler), FreeBSD (an operating system), PostgreSQL (a database), and GCluster (a clustering framework), we demonstrate that our adaptive change propagation heuristics show a 57% statistically significant improvement over the top-performing static change propagation heuristics.
Article
Full-text available
Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs and improve the quality of software. Objective: We investigate how the context of models, the independent variables used and the modelling techniques applied, influence the performance of fault prediction models. Method:We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesise the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modelling techniques such as Na??ve Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology and performance comprehensively.
Conference Paper
Full-text available
Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on the Eclipse project. We structure our study along 4 dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed on), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). Our case study on the Eclipse Platform 3.0 project shows that the comment and description text, the time it took to fix the bug, and the component the bug was found in are the most important factors in determining whether a bug will be re-opened. Based on these dimensions we create decision trees that predict whether a bug will be re-opened after its closure. Using a combination of our dimensions, we can build explainable prediction models that can achieve 62.9% precision and 84.5% recall when predicting whether a bug will be re-opened.
Conference Paper
Full-text available
In softwareevolution research logicalcoupling has extensively been used to recover the hidden dependencies between source code arti- facts. They would otherwise go lost because of the file-based na- ture of current versioning systems. Previous research has dealt with low-level couplings between files, leading to an explosion of data to be analyzed, or has abstracted the logical couplings to module level, leading to a loss of detailed information. In this paper we propose a visualization-based approach which integrates both file- level and module-level logical coupling information. This not only facilitates an in-depth analysis of the logical couplings at all granu- larity levels, it also leads to a precise characterization of the system modules in terms of their logical coupling dependencies.
Conference Paper
Full-text available
Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results.
Conference Paper
Full-text available
Change impact analysis aims at identifying software artifacts being affected by a change. In the past, this problem has been addressed by approaches relying on static, dynamic, and textual analysis. Recently, techniques based on histori- cal analysis and association rules have been explored. This paper proposes a novel change impact analysis me- thod based on the idea that the mutual relationships be- tween software objects can be inferred with a statistical learning approach. We use the bivariate Granger causality test, a multivariate time series forecasting approach used to verify whether past values of a time series are useful for predicting future values of another time series. Results of a preliminary study performed on the Samba daemon show that change impact relationships inferred with the Granger causality test are complementary to those in- ferred with association rules. This opens the road towards the development of an eclectic impact analysis approach con- ceived by combining different techniques.
Conference Paper
Full-text available
In this paper we present a comparative analysis of the predictive power of two different sets of metrics for defect prediction. We choose one set of product related and one set of process related software metrics and use them for classifying Java files of the Eclipse project as defective respective defect-free. Classification models are built using three common machine learners: logistic regression, Naïve Bayes, and decision trees. To allow different costs for prediction errors we perform cost-sensitive classification, which proves to be very successful: >75% percentage of correctly classified files, a recall of >80%, and a false positive rate
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.
Conference Paper
Links between issue reports and their corresponding commits in version control systems are often missing. However , these links are important for measuring the quality of a software system, predicting defects, and many other tasks. Several approaches have been designed to solve this problem by automatically linking bug reports to source code commits via comparison of textual information in commit messages and bug reports. Yet, the effectiveness of these techniques is oftentimes suboptimal when commit messages are empty or contain minimum information; this particular problem makes the process of recovering traceability links between commits and bug reports particularly challenging. In this work, we aim at improving the effectiveness of existing bug linking techniques by utilizing rich contextual information. We rely on a recently proposed approach, namely ChangeScribe, which generates commit messages containing rich contextual information by using code summarization techniques. Our approach then extracts features from these automatically generated commit messages and bug reports, and inputs them into a classification technique that creates a discriminative model used to predict if a link exists between a commit message and a bug report. We compared our approach, coined as RCLinker (Rich Context Linker), to MLink, which is an existing state-of-the-art bug linking approach. Our experiment results on bug reports from six software projects show that RCLinker outperforms MLink in terms of F-measure by 138.66%.
Conference Paper
Open source software is commonly portrayed as a meritocracy, where decisions are based solely on their technical merit. However, literature on open source suggests a complex social structure underlying the meritocracy. Social work environments such as GitHub make the relationships between users and between users and work artifacts transparent. This transparency enables developers to better use information such as technical value and social connections when making work decisions. We present a study on open source software contribution in GitHub that focuses on the task of evaluating pull requests, which are one of the primary methods for contributing code in GitHub. We analyzed the association of various technical and social measures with the likelihood of contribution acceptance. We found that project managers made use of information signaling both good technical contribution practices for a pull request and the strength of the social connection between the submitter and project manager when evaluating pull requests. Pull requests with many comments were much less likely to be accepted, moderated by the submitter's prior interaction in the project. Well-established projects were more conservative in accepting pull requests. These findings provide evidence that developers use both technical and social information when evaluating potential contributions to open source software projects.
Conference Paper
Developers are often faced with a natural language change request (such as a bug report) and tasked with identifying all code elements that must be modified in order to fulfill the request (e.g., fix a bug or implement a new feature). In order to accomplish this task, developers frequently and routinely perform change impact analysis. This formal demonstration paper presents ImpactMiner, a tool that implements an integrated approach to software change impact analysis. The proposed approach estimates an impact set using an adaptive combination of static textual analysis, dynamic execution tracing, and mining software repositories techniques. ImpactMiner is available from our online appendix http://www.cs.wm.edu/semeru/ImpactMiner/
Conference Paper
Aim - The goal of this study is to investigate the relationships between the communication structure of a software team and the resulting architecture of the software developed under the perspective of Conway's Law. Method - A quasi-experiment was designed in an industrial context in which the results of two teams were compared. One team worked using an agile approach based on Scrum with daily meetings and frequent daily communication. The other team used a more traditional approach based on hierarchical command-and-control style of management with very limited communication between team members. We observed both teams during the project, interviewed participants, and carried out a focus group to collect and compare impressions of team members at the end of the projects. Results - We found large differences between architectural designs of the two teams, as predicted. Hierarchical team presented best results with respect to efficiency and efficacy of development. Agile team produced a simpler solution and the system design was more coupled than hierarchical team. Conclusion - Our findings provided partial support to Conway's Law, but some surprising results were found.
Conference Paper
Software development depends on many factors, including technical, human and social aspects. Due to the complexity of this dependence, a unifying framework must be defined and for this purpose we adopt the complex networks methodology. We use a data-driven approach based on a large collection of open source software projects extracted from online project development platforms. The preliminary results presented in this article reveal that the network perspective yields key insights into the sustainability of software development.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
The paper presents an approach that combines conceptual and evolutionary techniques to support change impact analysis in source code. Conceptual couplings capture the extent to which domain concepts and software artifacts are related to each other. This information is derived using Information Retrieval based analysis of textual software artifacts that are found in a single version of software (e.g., comments and identifiers in a single snapshot of source code). Evolutionary couplings capture the extent to which software artifacts were co-changed. This information is derived from analyzing patterns, relationships, and relevant information of source code changes mined from multiple versions in software repositories. The premise is that such combined methods provide improvements to the accuracy of impact sets compared to the two individual approaches. A rigorous empirical assessment on the changes of the open source systems Apache httpd, ArgoUML, iBatis, KOffice, and jEdit is also reported. The impact sets are evaluated at the file and method levels of granularity for all the software systems considered in the empirical evaluation. The results show that a combination of conceptual and evolutionary techniques, across several cut-off points and periods of history, provides statistically significant improvements in accuracy over either of the two techniques used independently. Improvements in F-measure values of up to 14% (from 3% to 17%) over the conceptual technique in ArgoUML at the method granularity, and up to 21% over the evolutionary technique in iBatis (from 9% to 30%) at the file granularity were reported.
Article
ContextSoftware metrics may be used in fault prediction models to improve software quality by predicting fault location.Objective This paper aims to identify software metrics and to assess their applicability in software fault prediction. We investigated the influence of context on metrics’ selection and performance.Method This systematic literature review includes 106 papers published between 1991 and 2011. The selected papers are classified according to metrics and context properties.ResultsObject-oriented metrics (49%) were used nearly twice as often compared to traditional source code metrics (27%) or process metrics (24%). Chidamber and Kemerer’s (CK) object-oriented metrics were most frequently used. According to the selected studies there are significant differences between the metrics used in fault prediction performance. Object-oriented and process metrics have been reported to be more successful in finding faults compared to traditional size and complexity metrics. Process metrics seem to be better at predicting post-release faults compared to any static code metrics.Conclusion More studies should be performed on large industrial software systems to find metrics more relevant for the industry and to answer the question as to which metrics should be used in a given context.
Article
Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understanding the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to describe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.
Article
The paper presents an adaptive approach to perform impact analysis from a given change request to source code. Given a textual change request (e.g., a bug report), a single snapshot (release) of source code, indexed using Latent Semantic Indexing, is used to estimate the impact set. Should additional contextual information be available, the approach configures the best-fit combination to produce an improved impact set. Contextual information includes the execution trace and an initial source code entity verified for change. Combinations of information retrieval, dynamic analysis, and data mining of past source code commits are considered. The research hypothesis is that these combinations help counter the precision or recall deficit of individual techniques and improve the overall accuracy. The tandem operation of the three techniques sets it apart from other related solutions. Automation along with the effective utilization of two key sources of developer knowledge, which are often overlooked in impact analysis at the change request level, is achieved. To validate our approach, we conducted an empirical evaluation on four open source software systems. A benchmark consisting of a number of maintenance issues, such as feature requests and bug fixes, and their associated source code changes was established by manual examination of these systems and their change history. Our results indicate that there are combinations formed from the augmented developer contextual information that show statistically significant improvement over standalone approaches.
Conference Paper
When interacting with version control systems, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing the version history, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of five open-source Java projects, we found up to 15% of all bug fixes to consist of multiple tangled changes. Using a multi-predictor approach to untangle changes, we show that on average at least 16.6% of all source files are incorrectly associated with bug reports. We recommend better change organization to limit the impact of tangled changes.
Conference Paper
Coupling is a fundamental property of software systems, and numerous coupling measures have been proposed to support various development and maintenance activities. However, little is known about how developers actually perceive coupling, what mechanisms constitute coupling, and if existing measures align with this perception. In this paper we bridge this gap, by empirically investigating how class coupling - as captured by structural, dynamic, semantic, and logical coupling measures - aligns with developers' perception of coupling. The study has been conducted on three Java open-source systems - namely ArgoUML, JHotDraw and jEdit - and involved 64 students, academics, and industrial practitioners from around the world, as well as 12 active developers of these three systems. We asked participants to assess the coupling between the given pairs of classes and provide their ratings and some rationale. The results indicate that the peculiarity of the semantic coupling measure allows it to better estimate the mental model of developers than the other coupling measures. This is because, in several cases, the interactions between classes are encapsulated in the source code vocabulary, and cannot be easily derived by only looking at structural relationships, such as method calls.
Article
This introduction to the R package beanplot is a (slightly) modied version of Kamp- stra (2008), published in the Journal of Statistical Software. Boxplots and variants thereof are frequently used to compare univariate data. Boxplots have the disadvantage that they are not easy to explain to non-mathematicians, and that some information is not visible. A beanplot is an alternative to the boxplot for visual comparison of univariate data between groups. In a beanplot, the individual observations are shown as small lines in a one-dimensional scatter plot. Next to that, the estimated density of the distributions is visible and the average is shown. It is easy to compare dierent groups of data in a beanplot and to see if a group contains enough observations to make the group interesting from a statistical point of view. Anomalies in the data, such as bimodal distributions and duplicate measurements, are easily spotted in a beanplot. For groups with two subgroups (e.g., male and female), there is a special asymmetric beanplot. For easy usage, an implementation was made in R.
Article
Suppose you're a software team manager who's responsible for delivering a software product by a specific date, and your team uses a code integration system (referred to as a build in IBM Rational Jazz and in this article) to integrate its work before delivery. When the build fails, your team needs to spend extra time diagnosing the integration issue and reworking code. As the manager, you suspect that your team failed to communicate about a code dependency, which broke the build. Your team needs to quickly disseminate information about its interdependent work to achieve a successful integration build. How can you understand your team's communication? Social-network analysis can give you insight into the team's communication patterns that might have caused the build's failure.
Conference Paper
Coupling metrics capture the degree of interaction and relationships among source code elements in software systems. A vast majority of existing coupling metrics rely on structural information, which captures interactions such as usage relations between classes and methods or execute after associations. However, these metrics lack the ability to identify conceptual dependencies, which, for instance, specify underlying relationships encoded by developers in identifiers and comments of source code classes. We propose a new coupling metric for object-oriented software systems, namely Relational Topic based Coupling (RTC) of classes, which uses Relational Topic Models (RTM), generative probabilistic model, to capture latent topics in source code classes and relationships among them. A case study on thirteen open source software systems is performed to compare the new measure with existing structural and conceptual coupling metrics. The case study demonstrates that proposed metric not only captures new dimensions of coupling, which are not covered by the existing coupling metrics, but also can be used to effectively support impact analysis.
Conference Paper
Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics.
Conference Paper
The key premise of an organization is to allow more efficient production, including production of high quality software. To achieve that, an organization defines roles and reporting relationships. Therefore, changes in organization's structure are likely to affect product's quality. We propose and investigate a relationship between developer-centric measures of organizational change and the probability of customer-reported defects in the context of a large software project. We find that the proximity to an organizational change is significantly associated with reductions in software quality. We also replicate results of several prior studies of software quality supporting findings that code, change, and developer characteristics affect fault-proneness. In contrast to prior studies we find that distributed development decreases quality. Furthermore, recent departures from an organization were associated with increased probability of customer-reported defects, thus demonstrating that in the observed context the organizational change reduces product quality.
Conference Paper
Software fails and fixing it is expensive. Research in failure prediction has been highly successful at modeling software failures. Few models, however, consider the key cause of failures in software: people. Understanding the structure of developer collaboration could explain a lot about the reliability of the final product. We examine this collaboration structure with the developer network derived from code churn information that can predict failures at the file level. We conducted a case study involving a mature Nortel networking product of over three million lines of code. Failure prediction models were developed using test and post-release failure data from two releases, then validated against a subsequent release. One model's prioritization revealed 58% of the failures in 20% of the files compared with the optimal prioritization that would have found 61% in 20% of the files, indicating that a significant correlation exists between file-based developer network metrics and failures.
Conference Paper
A critical factor in work group coordination, communication has been studied extensively. Yet, we are missing objective evidence of the relationship between successful coordination outcome and communication structures. Using data from IBM's Jazztrade project, we study communication structures of development teams with high coordination needs. We conceptualize coordination outcome by the result of their code integration build processes (successful or failed) and study team communication structures with social network measures. Our results indicate that developer communication plays an important role in the quality of software integrations. Although we found that no individual measure could indicate whether a build will fail or succeed, we leveraged the combination of communication structure measures into a predictive model that indicates whether an integration will fail. When used for five project teams, our predictive model yielded recall values between 55% and 75%, and precision values between 50% to 76%.