ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

In software engineering, impact analysis consists in predicting the software elements (e.g. modules, classes, methods) potentially impacted by a change in the source code. Impact analysis is required to optimize the testing effort. In this paper, we propose a framework to predict error propagation. Based on 10 open-source Java projects and 5 classical mutation operators, we create 17000 mutants and study how the error they introduce propagates. This framework enables us to analyze impact prediction based on four types of call graph. Our results show that the sophistication indeed increases completeness of impact prediction. However, and surprisingly to us, the most basic call graph gives the highest trade-off between precision and recall for impact prediction.
This content is subject to copyright. Terms and conditions apply.
A large-scale study of call graph-based impact prediction
using mutation testing
Vincenzo Musco
1
Martin Monperrus
1
Philippe Preux
1
Published online: 26 July 2016
Springer Science+Business Media New York 2016
Abstract In software engineering, impact analysis involves predicting the software ele-
ments (e.g., modules, classes, methods) potentially impacted by a change in the source
code. Impact analysis is required to optimize the testing effort. In this paper, we propose an
evaluation technique to predict impact propagation. Based on 10 open-source Java projects
and 5 classical mutation operators, we create 17,000 mutants and study how the error they
introduce propagates. This evaluation technique enables us to analyze impact prediction
based on four types of call graph. Our results show that graph sophistication increases the
completeness of impact prediction. However, and surprisingly to us, the most basic call
graph gives the best trade-off between precision and recall for impact prediction.
Keywords Change impact analysis Call graphs Mutation testing
1 Introduction
Software continuously evolves through changes affecting a module, a class, a function. A
single change can impact the entire software package and may break many parts beyond
the changed element, e.g., a change in a file reading function can impact any class that uses
it to read files. When modifying the source code, developers think at whether the change
will introduce an error and whether the error will propagate and impact other modules. For
&Vincenzo Musco
vincenzo.musco@inria.fr
Martin Monperrus
martin.monperrus@univ-lille1.fr
Philippe Preux
philippe.preux@univ-lille3.fr
1
CRIStAL, INRIA, University of Lille, Villeneuve-d’Ascq, France
123
Software Qual J (2017) 25:921–950
DOI 10.1007/s11219-016-9332-8
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... To measure and evaluate the success of a code change prediction approach, ground-truth data is required. For instance, a study used mutation testing and failed test cases for groundtruth and evaluation [10]. The drawback of using test cases is that test cases applied on real systems may not have 100% coverage all the time. ...
... Tarantula [20], SBI [30], Ochiai [6] and Jaccard [7], they share the same basic insight, i.e., code elements mainly executed by failed tests are more suspicious. The Mutation-based Fault Localization (MBFL), e.g., [17], [33], [35], [55], [56], aims to additionally consider mutated code in fault localization. The examples of MBFL are Metallaxis [37], [38] and MUSE [33]. ...
Preprint
Full-text available
In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.
... Musco et al. [35] used four types of call-graphs to predict the software elements that are likely to be impacted by a change in the software. However, they used mutation testing to assess the impact of a change in the source code. ...
Article
Full-text available
Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.
... The structural information can either be extracted through Abstract Syntax Trees (AST) or Static Call Graphs (SCG). We decide to use SCG instead of AST because the call graph encodes how methods and functions call each others (Musco et al., 2017) and builds a graph (edges and nodes) of flow of code to show the relationship between classes and methods in the source code. The AST, on the other hand, represents the hierarchical syntactic structure of the source code sequentially. ...
Preprint
Software design patterns are standard solutions to common problems in software design and architecture. Knowing that a particular module implements a design pattern is a shortcut to design comprehension. Manually detecting design patterns is a time consuming and challenging task, therefore, researchers have proposed automatic design pattern detection techniques. However, these techniques show low performance for certain design patterns. In this work, we introduce a design pattern detection approach, DPD_F that improves the performance over the state-of-the-art by using code features with machine learning classifiers to automatically train a design pattern detector. DPD_F creates a semantic representation of Java source code using the code features and the call graph, and applies the \textit{Word2Vec} algorithm on the semantic representation to construct the word-space geometric model of the Java source code. DPD_F then builds a Machine Learning classifier trained on a labelled dataset and identifies software design patterns with over 80% Precision and over 79% Recall. Additionally, we have compared DPD_F with two existing design pattern detection techniques namely FeatureMaps & MARPLE-DPD. Empirical results demonstrate that our approach outperforms the state-of-the-art approaches by approximately 35% and 15% respectively in terms of Precision. The run-time performance also supports the practical applicability of our classifier.
... MBFL [23ś27] aims to additionally consider impact information for fault localization. Since code elements covered by failed/passed tests may not have any impact on the corresponding test outcomes, typical MBFL techniques use mutation testing [28,45,46] to simulate the impact of each code element for more precise fault localization. Here we mainly discuss the general MBFL techniques since the regression MBFL techniques (e.g., FIFL [27]) are not closely related to this work. ...
Conference Paper
Full-text available
Learning-based fault localization has been intensively studied recently. Prior studies have shown that traditional Learning-to-Rank techniques can help precisely diagnose fault locations using various dimensions of fault-diagnosis features, such as suspiciousness values computed by various off-the-shelf fault localization techniques. However, with the increasing dimensions of features considered by advanced fault localization techniques, it can be quite challenging for the traditional Learning-to-Rank algorithms to automatically identify effective existing/latent features. In this work, we propose DeepFL, a deep learning approach to automatically learn the most effective existing/latent features for precise fault localization. Although the approach is general, in this work, we collect various suspiciousness-value-based, fault-proneness-based and textual-similarity-based features from the fault localization, defect prediction and information retrieval areas, respectively. DeepFL has been studied on 395 real bugs from the widely used Defects4J benchmark. The experimental results show DeepFL can significantly outperform state-of-the-art TraPT/FLUCCS (e.g., localizing 50+ more faults within Top-1). We also investigate the impacts of deep model configurations (e.g., loss functions and epoch settings) and features. Furthermore, DeepFL is also surprisingly effective for cross-project prediction.
... A number of software development and maintenance tasks such as program comprehension, change impact analysis, refactoring, and debugging, rely on the dependencies between program elements (variables, functions, classes, methods, etc.) [5], [6], [7], [8]. Therefore, the already complex maintenance and evolution tasks of legacy applications may be hindered when hidden dependencies are not discovered. ...
Conference Paper
Full-text available
J2EE applications tend to be multi-tier and multi-language applications. They rely on the J2EE platform and containers that offer infrastructure and architectural services to ensure distributed, secure, safe, and scalable executions. These mechanisms hide many program dependencies, which helps development but hinders maintenance, evolution, and re-engineering of J2EE applications. In this paper, we study (i) the J2EE specifications to extract a declarative specification of the dependencies that are inherent in the services offered and that are not visible in the user code that uses them. Then, we introduce (ii) a codification of the dependencies into rules, and (iii) a tool that supports the specification of those dependencies and their detection in J2EE applications. We validate our approach and tool on a sample of 10 J2EE applications. We also compare our tool against JRipples, a state-of-the-art tool for change-impact analysis tasks. Results show that our tool adds, on average, 15% more call dependencies, which would have been missed otherwise. On change impact analysis tasks, our tool outperforms JRipples in all 10 applications, especially for the early iterations of change propagation exploration.
Article
As the complexity of modern software systems increases, changes in software have become crucial to the software lifecycle. For this reason, it is an important issue for software developers to analyze the changes that occur in the software and to prevent the changes from causing errors in the software. In this paper, mutation testing as software test analysis is examined. Mutation tests have been implemented on open-source Java projects. In addition, for aviation projects, emphasis is placed on performing change impact analysis processes in compliance with the certification based on DO-178C processes.
Article
Software design patterns are standard solutions to common problems in software design and architecture. Knowing that a particular module implements a design pattern is a shortcut to design comprehension. Manually detecting design patterns is a time consuming and challenging task, therefore, researchers have proposed automatic design pattern detection techniques. However, these techniques show low performance for certain design patterns. In this work, we introduce a design pattern detection approach, DPDF that improves the performance over the state-of-the-art by using code features with machine learning classifiers to automatically train a design pattern detector. DPDF creates a semantic representation of Java source code using the code features and the call graph, and applies the Word2Vec algorithm on the semantic representation to construct the word-space geometric model of the Java source code. DPDF then builds a Machine Learning classifier trained on a labelled dataset and identifies software design patterns with over 80% Precision and over 79% Recall. Additionally, we have compared DPDF with two existing design pattern detection techniques namely FeatureMaps & MARPLE-DPD. Empirical results demonstrate that our approach outperforms the state-of-the-art approaches by approximately 35% and 15% respectively in terms of Precision. The run-time performance also supports the practical applicability of our classifier.
Conference Paper
Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs similar to real ones. Since real graphs are evolving and this evolution is important to study in order to understand the underlying dynamical system, we tackle the problem of generating series of graphs. We propose SEDGE, an algorithm meant to generate series of graphs similar to a real series. SEDGE is an extension of SDG. We consider graphs that are representations of software programs and show experimentally that our approach outperforms other existing approaches. Experiments show the performance of both algorithms.
Conference Paper
Full-text available
Observation-based slicing is a recently-introduced, language-independent slicing technique based on the dependencies observable from program behaviour. Due to the well-known limits of dynamic analysis, we may only compute an under-approximation of the true observation-based slice. However, because the observation-based slice captures all possible dependence that can be observed, even such approximations can yield insight into the limitations of static slicing. For example, a static slice, S, that is strictly smaller than the corresponding observation based slice is potentially unsafe. We present the results of three sets of experiments on 12 different programs, including benchmarks and larger programs, which investigate the relationship between static and observation-based slicing. We show that, in extreme cases, observation-based slices can find the true minimal static slice, where static techniques cannot. For more typical cases, our results illustrate the potential for observation-based slicing to highlight limitations in static slicers. Finally, we report on the sensitivity of observation-based slicing to test quality.
Conference Paper
Full-text available
Current slicing techniques cannot handle systems written in multiple programming languages. Observation-Based Slicing (ORBS) is a language-independent slicing technique capable of slicing multi-language systems, including systems which contain (third party) binary components. A potential slice obtained through repeated statement deletion is validated by observing the behaviour of the program: if the slice and original program behave the same under the slicing criterion, the deletion is accepted. The resulting slice is similar to a dynamic slice. We evaluate five variants of ORBS on ten programs of different sizes and languages showing that it is less expensive than similar existing techniques. We also evaluate it on bash and four other systems to demonstrate feasible large-scale operation in which a parallelised ORBS needs up to 82% less time when using four threads. The results show that an ORBS slicer is simple to construct, effective at slicing, and able to handle systems written in multiple languages without specialist analysis tools.
Article
Full-text available
This paper presents SPOON, a library for the analysis and transformation of Java source code. SPOON enables Java developers to write a large range of domain-specific analyses and transformations in an easy and concise manner. SPOON analyses and transformations are written in plain Java. With SPOON, developers do not need to dive into parsing, to hack a compiler infrastructure, or to master a new formalism. Copyright
Conference Paper
Full-text available
Sensitivity analysis determines how a system responds to stimuli variations, which can benefit important software-engineering tasks such as change-impact analysis. We present SENSA, a novel dynamic-analysis technique and tool that combines sensitivity analysis and execution differencing to estimate the dependencies among statements that occur in practice. In addition to identifying dependencies, SENSA quantifies them to estimate how much or how likely a statement depends on another. Quantifying dependencies helps developers prioritize and focus their inspection of code relationships. To assess the benefits of quantifying dependencies with SENSA, we applied it to various statements across Java subjects to find and prioritize the potential impacts of changing those statements. We found that SENSA predicts the actual impacts of changes to those statements more accurately than static and dynamic forward slicing. Our SENSA prototype tool is freely available for download.
Conference Paper
Full-text available
Dependence analysis is a fundamental technique for program understanding and is widely used in software testing and debugging. However, there are a limited number of analysis tools available despite a wide range of research work in this field. In this paper, we present JavaPDG1, a static analyzer for Java bytecode, which is capable of producing various graphical representations such as the system dependence graph, procedure dependence graph, control flow graph and call graph. As a program-dependence-graph based analyzer, JavaPDG performs both intra- and inter-procedural dependence analysis, and enables researchers to apply a wide range of program analysis techniques that rely on dependence analysis. JavaPDG provides a graphical viewer to browse and analyze the various graphs and a convenient JSON based serialization format.
Conference Paper
Optimizing compilers for object-oriented languages apply static class analysis and other techniques to try to deduce precise information about the possible classes of the receivers of messages; if successful, dynamically-dispatched messages can be replaced with direct procedure calls and potentially further optimized through inline-expansion. By examining the complete inheritance graph of a program, which we call class hierarchy analysis, the compiler can improve the quality of static class information and thereby improve run-time performance. In this paper we present class hierarchy analysis and describe techniques for implementing this analysis effectively in both statically- and dynamically-typed languages and also in the presence of multi-methods. We also discuss how class hierarchy analysis can be supported in an interactive programming environment and, to some extent, in the presence of separate compilation. Finally, we assess the bottom-line performance improvement due to class hierarchy analysis alone and in combination with two other “competing” optimizations, profile-guided receiver class prediction and method specialization.
Article
Building is an integral part of the software development process. However, little is known about the compiler errors that occur in this process. In this paper, we present an empirical study of 26.6 million builds produced during a period of nine months by thousands of developers. We describe the workflow through which those builds are generated, and we analyze failure frequency, compiler error types, and resolution efforts to fix those compiler errors. The results provide insights on how a large organization build process works, and pinpoints errors for which further developer support would be most effective.
Conference Paper
Recent studies have shown that use of causal inference techniques for reducing confounding bias improves the effectiveness of statistical fault localization (SFL) at the level of program statements. However, with very large programs and test suites, the overhead of statement-level causal SFL may be excessive. Moreover cost evaluations of statement-level SFL techniques generally are based on a questionable assumption-that software developers can consistently recognize faults when examining statements in isolation. To address these issues, we propose and evaluate a novel method-level SFL technique called MFL, which is based on causal inference methodology. In addition to reframing SFL at the method level, our technique incorporates a new algorithm for selecting covariates to use in adjusting for confounding bias. This algorithm attempts to ensure that such covariates satisfy the conditional exchangeability and positivity properties required for identifying causal effects with observational data. We present empirical results indicating that our approach is more effective than four method-level versions of well-known SFL techniques and that our confounder selection algorithm is superior to two alternatives.
Conference Paper
This paper deals with an approach based on the similarity of mutants. This similarity is used to reduce the number of mutants to be executed. In order to calculate such a similarity among mutants their structure is used. Each mutant is converted into a hierarchical graph, which represents the program’s flow, variables and conditions. On the basis of this graph form a special graph kernel is defined to calculate similarity among programs. It is then used to predict whether a given test would detect a mutant or not. The prediction is carried out with the help of a classification algorithm. This approach should help to lower the number of mutants which have to be executed. An experimental validation of this approach is also presented in this paper. An example of a program used in experiments is described and the results obtained, especially classification errors, are presented.
Article
Software change impact analysis (CIA) is a technique for identifying the effects of a change, or estimating what needs to be modified to accomplish a change. Since the 1980s, there have been many investigations on CIA, especially for code‐based CIA techniques. However, there have been very few surveys on this topic. This article tries to fill this gap. And 30 papers that provide empirical evaluation on 23 code‐based CIA techniques are identified. Then, data was synthesized against four research questions. The study presents a comparative framework including seven properties, which characterize the CIA techniques, and identifies key applications of CIA techniques in software maintenance. In addition, the need for further research is also presented in the following areas: evaluating existing CIA techniques and proposing new CIA techniques under the proposed framework, developing more mature tools to support CIA, comparing current CIA techniques empirically with unified metrics and common benchmarks, and applying the CIA more extensively and effectively in the software maintenance phase. Copyright © 2012 John Wiley & Sons, Ltd.