ArticlePDF Available

KRAB Algorithm - A Revised Algorithm for Incremental Call Graph Generation

Authors:
• Incture Technologies Pvt Ltd

Abstract and Figures

This paper is aimed to present the importance and implementation of an incremental call graph plugin. An algorithm is proposed for the call graph implementation which has better overall performance than the algorithm that has been proposed previously. In addition to this, the algorithm has been empirically proved to have excellent performance on recursive codes. The algorithm also readily checks for function skip and returns exceptions.
Content may be subject to copyright.
1
KRAB Algorithm - A Revised Algorithm for Incremental Call
Graph Generation
Rajsekhara Babu *
Krishnakumar V.
George Abraham
Kiransinh Borasia
School of Computing
Science and Engineering
School of Computing
Science and Engineering
School of Computing
Science and Engineering
School of Computing
Science and Engineering
VIT University
VIT University
VIT University
VIT University
Vellore, India
Vellore, India
Vellore, India
Vellore, India
mrajasekharababu
@vit.ac.in
venkatasubramanian2011
@vit.ac.in
george.abraham2011
@vit.ac.in
borasiakiransinh.ranjitsinh2011
@vit.ac.in
Abstract: This paper is aimed to present the importance and implementation of an incremental call graph plugin.
An algorithm is proposed for the call graph implementation which has better overall performance than the
algorithm that has been proposed previously. In addition to this, the algorithm has been empirically proved to have
excellent performance on recursive codes. The algorithm also readily checks for function skip and returns
exceptions.
Keywords: Call Graph, Incremental Approach, Recursive Codes
I. INTRODUCTION
Call graph analysis is essential for understanding the execution of the program and also for debugging. The existing
static call graph representation requires re-computation of the entire code whenever a small modification is made. An
innovative approach is to follow an incremental approach, which would require the computation of the call graph of
only the modified part and add it as an increment to the existing call graph. In this graph, each node represents a
procedure and each edge from node A to node B indicates that procedure A has called procedure B. The main benefit
of such call graphs is it provides a basic program analysis for human understanding. With both the static and dynamic
call graphs, the programmer is able to understand the execution of his program and also aids him in debugging, when
required. One simple application of call graphs is finding procedures that are never called. Call graphs can be dynamic
or static. A dynamic call graph records the execution of a program and hence exactly describes one run of the program
whereas a static call graph represents every possible run of the program. By tracking a call graph, it may be possible to
detect anomalies of program execution or code injection attacks.
II. RELATED WORK
There are a lot of techniques which are used for the generation of call graphs. They include Reachability Analysis
[2][7], Inter Procedural Class Analysis [1], Class Hierarchy Analysis [3][7] and Fast Static Analysis [4]. Reachability
is basically the ability of the graph algorithm to get from one vertex of the graph to another. A function A is said to be
reachable from function B is there is a function call for A in the definition of function B. Reachability Analysis is
performed by applying the algorithm recursively on each and every function whose function call is available from the
entry point. Reachability Analysis can be improved by pre-computing the names and storing them in a data-structure
(like the hash table) through which we can easily link the dependencies. In Object Oriented programming languages,
the target procedure of a function call cannot be resolved just by examining the source code. This is due to the use of
polymorphism present in the object oriented languages. The invoked function is strongly coupled with the object of the
class which invokes it. Therefore for OOPL, inter-procedural data and flow analysis is important to understand the
control flow of the functions. Figure 1 shows the class hierarchy in a real world scenario. Here, Class A acts as the
base class while all the other are derived from Class A. This is called as the Inheritance graph. This graph helps the
compiler to get an idea of how the transition occurs from class to class, thereby improving the run time performance of
the call graph algorithm. Fast Static Analysis takes care of the virtual functions in the programs, which are extremely
hard for the computer programs to analyze. Thereby, it removes a lot of overhead and improves the speed of the
algorithm many times. It is used in object oriented programming languages since; it is more effective in them.
2
Figure 1. Inheritance Graph
III. APPROACH
Eclipse is mainly used as an IDE for Java programs. Java being an Object Oriented Programming Language, Class
Hierarchy Analysis is the approach which is most suitable for the Eclipse IDE plugin. Using Class Hierarchy Analysis,
we can easily resolve the discrepancies that arise due to inheritance and polymorphism, thereby serving a dual
purpose.
Classical Approach
Figure 2 represents the classic algorithm for call graph generation using the Class Hierarchy method. This algorithm is
efficient for most of the cases, but has two serious drawbacks. Firstly, the performance reduces drastically when the
system has recursive calls. The data structure which is used in this algorithm creates an instance for every recursive
call that is made by the function. This puts added stress on the graph data-structure that we use thereby, increasing the
amount of time taken for its execution. Secondly, once the function completes the execution, the graph has to
backtrack to its parent. In the generic algorithm, the data structure doesn’t take care of this issue and leaves it to the
system. This results in more processor overhead and concurrently more execution time.
Since we are dealing with an incremental approach, it is necessary that we assign the pointer variables in the data
structure components for easy traversals in case there are real-time changes that are made by the user in the structure
of the program. The back-tracking becomes a time consuming task due to the absence of these pointers.
Figure 2. Classical Algorithm
KRAB Algorithm Our Approach
Figure 3 gives the pseudo code for the algorithm that we propose. This algorithm takes care of both the recursion
related issue and the time taken due to back tracking. For this purpose, we make the graph data-structure using the help
of a stack. The stack keeps a note of every function call encountered and automatically places a pointer from the
current node to the parent node as a predecessor the moment the callee method returns its control to the caller method.
In this algorithm, we push every method call (in the form of a node) invoked by a specific object into a temporary
stack. Now the control traverses the called function and checks for any method calls in it. It there are calls, it first
checks whether it’s a recursive call or a non-recursive call. If it is a recursive call the data-structure is pointed to itself
rather than creating a new node instance. But, in order to verify the node traversal, every recursive call too is pushed
into the stack. When the function returns its control, two activities are performed. First, a logical pointer link is
3
generated from the current method to the parent method. This necessarily creates a predecessor pointer from the
current method to the parent method. Second, the control is transferred from the current method to the parent method
by popping the stack, since according to the algorithm; the node at the top of stack always has the control of the
method.
Figure 3. Proposed Algorithm KRAB Algorithm
IV. ANALYSIS
In the classical approach, we know that the time taken for building a standard class hierarchy takes on an average of
θ(n2), since this is the time required for constructing a graph data structure. Finding the target methods (step 7) will
take θ(n) time. Finding the methods that needs to be reprocessed takes on an average θ(n). The time complexity of
searching is taken as θ(n) since, we have to perform linear search as the given set of values may not be in a sorted
manner. This gives the quadratic equation f(n) as:
f(n) = n2+ n + n
= n2+ 2n -------- [1]
Hence, the minimum iterations that needs to be carried out in the classical algorithm is n2+ 2n iterations where n is the
number of function calls present in a particular code.
For KRAB Algorithm, we can neglect the time required for push and pop operations in the stack and concentrate on
the critical operations, which is the operations in the while loop.
Here, if we have n function calls, the while loop runs n times. In the best case scenario, there are no nested function
calls, which give the Time Complexity as Ω(n) since the while loop runs once and ends. In the worst case, there would
be nested function calls inside every function, which would give the Time Complexity as O(n2) since, the while loop
has to execute n times within every function call due to nesting. On an average case, we can say that some function
may have nested calls while some may not have nested calls. Let us assume that there are “log n” number of calls
which have nested calls. This gives us the equation of k(n) as:
k(n) = nlogn (for average case) -------- [2]
= n2(for worst case) -------- [3]
= n (for best case) -------- [4]
Table 1, shows a selected sample input that we have taken for analyzing our algorithm with the classical algorithm.
The first column represents the number of function calls that may occur in a particular code. The empirical analysis
gives the results as given in the table. This data has been plotted in the form of bar graphs to avail the following
output.
4
f(n)
kw(n)
ka(n)
kb(n)
40400
40000
1059.664
200
160800
160000
2396.587
400
361200
360000
3838.16
600
641600
640000
5347.693
800
1002000
1000000
6907.76
1000
Table 1. Performance Analysis
Figure 4. Classical Algorithm
As we can see in the Figure 4, the algorithm has a complexity of O(n2) in the best, worst and average case. Hence it is
very inefficient for creating the call graphs.
Figure 5. KRAB Algorithm (Worst Case)
The worst case for KRAB Algorithm occurs when each and every function call in the program has nested function
calls within it. The worst case complexity is O(n2) which is the same as that of the average complexity of the classical
algorithm.
5
Figure 6. KRAB Algorithm (Average Case)
For the average case, we have assumed that some of the function calls are either recursive or non-nested while others
have nested function calls. Under such cases, the algorithm exhibits comparatively better complexity. Suppose there
are “log n” instances which have nested method calls out of a possible “n” method calls, then we will have a
complexity of O(n log n). This is an excellent improvement over the classical algorithm as we can see through the
graph. If we have 1000 function calls, it requires more than one million iterations for the classical algorithm, while it
requires only 6900 iterations for the KRAB algorithm.
Figure 7. KRAB Algorithm (Best Case)
The best case assumes that every function call is either a recursive call or a non-nested function call. KRAB algorithm
has a best case complexity of O(n) which is a huge improvement since the No. of iterations is equal to the No. of
Function Calls required.
V. CONCLUSION
In this paper we have proposed our algorithm for incremental call graph analysis and also analyzed its efficiency over
the classical algorithm. The KRAB Algorithm has many advantages over the classical algorithms as well as some
drawbacks. The usage of stacks in the KRAB algorithm makes it easy to keep track of the program controls. The time
required to compute the predecessor of the current node is less and is equivalent to the time required for popping the
stack. It works better and more efficiently on programs which has recursive method calls. The time complexity of the
KRAB algorithm is better than the classical algorithm in all possible cases. It is capable of identifying any skips that
may occur during the execution of the program. If it occurs, the stack won’t be empty after the last pop operation,
thereby, throwing an exception to the coder. Even though KRAB algorithm is more efficient that the classical
algorithm, it requires an extra space complexity of ‘n’ during its first iteration for the temporary stack. The
performance reduces when virtual functions are used in the programs. Another limitation of the KRAB algorithm is
that it doesn’t support Auto Boxing of the function calls on primitive types. Class Hierarchy technique requires the
classes to be named. If anonymous inner classes are used in the code, then handling them becomes a tedious task.
6
VI. FUTURE WORK
This algorithm is still in its infancy and requires huge number of improvements. This algorithm has been improved to
extend the support for virtual functions and auto-boxing. The extension to virtual functions can be achieved by
integrating the Hierarchical Analysis with Fast Static Analysis. Fast Static Analysis provides excellent results if virtual
functions are present. Hence, a hybrid of these two Analytical methods will prove to be helpful in improving the
stability of the algorithm. Auto Boxing is used by most of the programmers since it’s easy to use and reduces the lines
of code. Therefore, compatibility for Auto Boxing is an essential point that would be taken care of during the
improvement.
REFERENCES
[1] D. Grove and C. Chambers, "A Framework for Call Graph Construction Algorithms", ACM Transactions, ACM, Vol. 3, No 6, November
2001
[2] A. Srivastava, "Unreachable Procedures in Object-oriented Programming", WRL Research Report, August 1993
[3] J. Dean, D. Grove and C. Chambers, "Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis", 1994
[4] D. F. Bacon and P. F. Sweeney, "Fast Static Analysis of C++ Virtual Function Calls", ACM Conference, ACM, October 1996
[5] U. Ismail, "Incremental Call Graph Construction for the Eclipse IDE", University of Waterloo Technical Report, 2009
[6] W. Zhang and B. Ryder, "Constructing Accurate Application Call Graphs For Java To Model Library Callbacks"
[7] F. Tip and J. Palsberg, "Scalable Propagation-based Call Graph Construction Algorithms"
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Virtual functions make code easier for programmers to reuse but also make it harder for compilers to analyze. We investigate the ability of three static analysis algorithms to improve C++ programs by resolving virtual function calls, thereby reducing compiled code size and reducing program complexity so as to improve both human and automated program understanding and analysis. In measurements of seven programs of significant size (5000 to 20000 lines of code each) we found that on average the most precise of the three algorithms resolved 71% of the virtual function calls and reduced compiled code size by 25%. This algorithm is very fast: it analyzes 3300 source lines per second on an 80 MHz PowerPC 601. Because of its accuracy and speed, this algorithm is an excellent candidate for inclusion in production C++ compilers. 1 Introduction A major advantage of object-oriented languages is abstraction. The most important language feature that supports abstraction is the dynamic dispatch of...
Article
Full-text available
. Optimizing compilers for object-oriented languages apply static class analysis and other techniques to try to deduce precise information about the possible classes of the receivers of messages; if successful, dynamicallydispatched messages can be replaced with direct procedure calls and potentially further optimized through inline-expansion. By examining the complete inheritance graph of a program, which we call class hierarchy analysis, the compiler can improve the quality of static class information and thereby improve run-time performance. In this paper we present class hierarchy analysis and describe techniques for implementing this analysis effectively in both statically- and dynamically-typed languages and also in the presence of multi-methods. We also discuss how class hierarchy analysis can be supported in an interactive programming environment and, to some extent, in the presence of separate compilation. Finally, we assess the bottom-line performance improvement...
Article
A call graph is defined as a set of directed edges connect- ing call sites (statements invoking method calls) to corre- sponding target methods (6). It is a very powerful tool for program analysis and can be used to: help plan testing strate- gies, reduce program size (by eliminating sub-routines that are not invoked) and help programmers understand and de- bug large programs. Often the method invoked due to a spe- cific call is determined at runtime based on the context in which the call is made, hence in a call graph a single call site could have multiple target methods. This is especially evident in object oriented languages where inheritance and polymorphism make method calls highly dependent on the execution context. To get the set of target methods associ- ated with a call site we can either observe one or more exe- cutions of the program and note all methods invoked from a call site (dynamic call graph generation) or statically det er- mine the possible methods (static call graph construction). Dynamic call graphs tend to under-estimate the number of target methods for a call site where as static call graphs ten d to over-estimate this this number. A theoretically ideal ca ll graph is the union of the dynamic call graphs over all pos- sible executions of the program. Dynamic call graphs are not safe and generating static call graphs is computational ly expensive. To ameliorate the overhead we propose an incre- mental call graph generation approach which will compute graphs for fragments of the program as they are being de- veloped. It will then recursively combine fragments until a graph for the whole program is generated. The graph will be as precise as corresponding traditional algorithms and will present, upon completion, a safe call graph.
Conference Paper
Call graphs are widely used to represent calling relation- ships among methods. However, there is not much inter- est in calling relationships among library methods in many software engineering applications such as program under- standing and testing, especially when the library is very big and the calling relationships are not trivial. This paper ex- plores approaches to generate more accurate application call graphs for Java. A new data reachability algorithm is proposed and fine-tuned to resolve library callbacks ac- curately. Compared to a simple algorithm that generates an application call graph by traversing the whole-program call graph, the fine-tuned data reachability algorithm re- sults in fewer spurious callback edges. In experiments with the spec jvm98 benchmarks, the new algorithm shows a sig- nificant reduction in the number of spurious callback edges over the simple algorithm: on average, the number of call- back edges is reduced by 74.97%, amounting to overall 64.43% edge reduction for the generated application call graphs.
Article
A large number of call graph construction algorithms for object-oriented and functional languages have been proposed, each embodying different tradeoffs between analysis cost and call graph preci-sion. In this article we present a unifying framework for understanding call graph construction algo-rithms and an empirical comparison of a representative set of algorithms. We first present a general parameterized algorithm that encompasses many well-known and novel call graph construction algorithms. We have implemented this general algorithm in the Vortex compiler infrastructure, a mature, multilanguage, optimizing compiler. The Vortex implementation provides a "level playing field" for meaningful cross-algorithm performance comparisons. The costs and benefits of a number of call graph construction algorithms are empirically assessed by applying their Vortex implemen-tation to a suite of sizeable (5,000 to 50,000 lines of code) Cecil and Java programs. For many of these applications, interprocedural analysis enabled substantial speed-ups over an already highly optimized baseline. Furthermore, a significant fraction of these speed-ups can be obtained through the use of a scalable, near-linear time call graph construction algorithm.
Unreachable procedures are procedures that can never be invoked. Their existence may adversely affect the performance of a program. Unfortunately, their detection requires the entire program to be present. Using a link-time code modification system, we analyze large linked program modules of C++, C and Fortran. We find that C++ programs using objectoriented programming style contain a large fraction of unreachable procedure code. In contrast, C and Fortran programs have a low and essentially constant fraction of unreachable code. In this paper, we present our analysis of C++, C and Fortran programs, and discuss how object-oriented programming style generates unreachable procedures. This paper will appear in the ACM LOPLAS Vol 1, #4.. It replaces Technical Note TN-21, an earlier version of the same material. i 1 Introduction Unreachable procedures unnecessarily bloat an executable, making it require more disk space and decreasing its locality, which may affect its cache and paging be...
Article
Propagation-based call graph construction algorithms have been studied intensively in the 1990s, and differ primarily in the number of sets that are used to approximate run-time values of expressions. In practice, algorithms such as RTA that use a single set for the whole program scale well. The scalability of algorithms such as 0-CFA that use one set per expression remains doubtful. In this paper, we investigate the design space between RTA and 0-CFA. We have implemented various novel algorithms in the context of Jax, an application extractor for Java, and shown that they all scale to a 325,000-line program. A key property of these algorithms is that they do not analyze values on the run-time stack, which makes them efficient and easy to implement. Surprisingly, for detecting unreachable methods, the inexpensive RTA algorithm does almost as well as the seemingly more powerful algorithms. However, for determining call sites with a single target, one of our new algorithms obtains the c...