Content uploaded by M. Rajasekhara Babu
Author content
All content in this area was uploaded by M. Rajasekhara Babu on Sep 17, 2015
Content may be subject to copyright.
1
KRAB Algorithm - A Revised Algorithm for Incremental Call
Graph Generation
Rajsekhara Babu *
Krishnakumar V.
George Abraham
Kiransinh Borasia
School of Computing
Science and Engineering
School of Computing
Science and Engineering
School of Computing
Science and Engineering
School of Computing
Science and Engineering
VIT University
VIT University
VIT University
VIT University
Vellore, India
Vellore, India
Vellore, India
Vellore, India
mrajasekharababu
@vit.ac.in
venkatasubramanian2011
@vit.ac.in
george.abraham2011
@vit.ac.in
borasiakiransinh.ranjitsinh2011
@vit.ac.in
Abstract: This paper is aimed to present the importance and implementation of an incremental call graph plugin.
An algorithm is proposed for the call graph implementation which has better overall performance than the
algorithm that has been proposed previously. In addition to this, the algorithm has been empirically proved to have
excellent performance on recursive codes. The algorithm also readily checks for function skip and returns
exceptions.
Keywords: Call Graph, Incremental Approach, Recursive Codes
I. INTRODUCTION
Call graph analysis is essential for understanding the execution of the program and also for debugging. The existing
static call graph representation requires re-computation of the entire code whenever a small modification is made. An
innovative approach is to follow an incremental approach, which would require the computation of the call graph of
only the modified part and add it as an increment to the existing call graph. In this graph, each node represents a
procedure and each edge from node A to node B indicates that procedure A has called procedure B. The main benefit
of such call graphs is it provides a basic program analysis for human understanding. With both the static and dynamic
call graphs, the programmer is able to understand the execution of his program and also aids him in debugging, when
required. One simple application of call graphs is finding procedures that are never called. Call graphs can be dynamic
or static. A dynamic call graph records the execution of a program and hence exactly describes one run of the program
whereas a static call graph represents every possible run of the program. By tracking a call graph, it may be possible to
detect anomalies of program execution or code injection attacks.
II. RELATED WORK
There are a lot of techniques which are used for the generation of call graphs. They include Reachability Analysis
[2][7], Inter Procedural Class Analysis [1], Class Hierarchy Analysis [3][7] and Fast Static Analysis [4]. Reachability
is basically the ability of the graph algorithm to get from one vertex of the graph to another. A function A is said to be
reachable from function B is there is a function call for A in the definition of function B. Reachability Analysis is
performed by applying the algorithm recursively on each and every function whose function call is available from the
entry point. Reachability Analysis can be improved by pre-computing the names and storing them in a data-structure
(like the hash table) through which we can easily link the dependencies. In Object Oriented programming languages,
the target procedure of a function call cannot be resolved just by examining the source code. This is due to the use of
polymorphism present in the object oriented languages. The invoked function is strongly coupled with the object of the
class which invokes it. Therefore for OOPL, inter-procedural data and flow analysis is important to understand the
control flow of the functions. Figure 1 shows the class hierarchy in a real world scenario. Here, Class A acts as the
base class while all the other are derived from Class A. This is called as the Inheritance graph. This graph helps the
compiler to get an idea of how the transition occurs from class to class, thereby improving the run time performance of
the call graph algorithm. Fast Static Analysis takes care of the virtual functions in the programs, which are extremely
hard for the computer programs to analyze. Thereby, it removes a lot of overhead and improves the speed of the
algorithm many times. It is used in object oriented programming languages since; it is more effective in them.
2
Figure 1. Inheritance Graph
III. APPROACH
Eclipse is mainly used as an IDE for Java programs. Java being an Object Oriented Programming Language, Class
Hierarchy Analysis is the approach which is most suitable for the Eclipse IDE plugin. Using Class Hierarchy Analysis,
we can easily resolve the discrepancies that arise due to inheritance and polymorphism, thereby serving a dual
purpose.
Classical Approach
Figure 2 represents the classic algorithm for call graph generation using the Class Hierarchy method. This algorithm is
efficient for most of the cases, but has two serious drawbacks. Firstly, the performance reduces drastically when the
system has recursive calls. The data structure which is used in this algorithm creates an instance for every recursive
call that is made by the function. This puts added stress on the graph data-structure that we use thereby, increasing the
amount of time taken for its execution. Secondly, once the function completes the execution, the graph has to
backtrack to its parent. In the generic algorithm, the data structure doesn’t take care of this issue and leaves it to the
system. This results in more processor overhead and concurrently more execution time.
Since we are dealing with an incremental approach, it is necessary that we assign the pointer variables in the data
structure components for easy traversals in case there are real-time changes that are made by the user in the structure
of the program. The back-tracking becomes a time consuming task due to the absence of these pointers.
Figure 2. Classical Algorithm
KRAB Algorithm –Our Approach
Figure 3 gives the pseudo code for the algorithm that we propose. This algorithm takes care of both the recursion
related issue and the time taken due to back tracking. For this purpose, we make the graph data-structure using the help
of a stack. The stack keeps a note of every function call encountered and automatically places a pointer from the
current node to the parent node as a predecessor the moment the callee method returns its control to the caller method.
In this algorithm, we push every method call (in the form of a node) invoked by a specific object into a temporary
stack. Now the control traverses the called function and checks for any method calls in it. It there are calls, it first
checks whether it’s a recursive call or a non-recursive call. If it is a recursive call the data-structure is pointed to itself
rather than creating a new node instance. But, in order to verify the node traversal, every recursive call too is pushed
into the stack. When the function returns its control, two activities are performed. First, a logical pointer link is
3
generated from the current method to the parent method. This necessarily creates a predecessor pointer from the
current method to the parent method. Second, the control is transferred from the current method to the parent method
by popping the stack, since according to the algorithm; the node at the top of stack always has the control of the
method.
Figure 3. Proposed Algorithm –KRAB Algorithm
IV. ANALYSIS
In the classical approach, we know that the time taken for building a standard class hierarchy takes on an average of
θ(n2), since this is the time required for constructing a graph data structure. Finding the target methods (step 7) will
take θ(n) time. Finding the methods that needs to be reprocessed takes on an average θ(n). The time complexity of
searching is taken as θ(n) since, we have to perform linear search as the given set of values may not be in a sorted
manner. This gives the quadratic equation f(n) as:
f(n) = n2+ n + n
= n2+ 2n -------- [1]
Hence, the minimum iterations that needs to be carried out in the classical algorithm is n2+ 2n iterations where n is the
number of function calls present in a particular code.
For KRAB Algorithm, we can neglect the time required for push and pop operations in the stack and concentrate on
the critical operations, which is the operations in the while loop.
Here, if we have n function calls, the while loop runs n times. In the best case scenario, there are no nested function
calls, which give the Time Complexity as Ω(n) since the while loop runs once and ends. In the worst case, there would
be nested function calls inside every function, which would give the Time Complexity as O(n2) since, the while loop
has to execute n times within every function call due to nesting. On an average case, we can say that some function
may have nested calls while some may not have nested calls. Let us assume that there are “log n” number of calls
which have nested calls. This gives us the equation of k(n) as:
k(n) = nlogn (for average case) -------- [2]
= n2(for worst case) -------- [3]
= n (for best case) -------- [4]
Table 1, shows a selected sample input that we have taken for analyzing our algorithm with the classical algorithm.
The first column represents the number of function calls that may occur in a particular code. The empirical analysis
gives the results as given in the table. This data has been plotted in the form of bar graphs to avail the following
output.
4
Function Calls
f(n)
kw(n)
ka(n)
kb(n)
200
40400
40000
1059.664
200
400
160800
160000
2396.587
400
600
361200
360000
3838.16
600
800
641600
640000
5347.693
800
1000
1002000
1000000
6907.76
1000
Table 1. Performance Analysis
Figure 4. Classical Algorithm
As we can see in the Figure 4, the algorithm has a complexity of O(n2) in the best, worst and average case. Hence it is
very inefficient for creating the call graphs.
Figure 5. KRAB Algorithm (Worst Case)
The worst case for KRAB Algorithm occurs when each and every function call in the program has nested function
calls within it. The worst case complexity is O(n2) which is the same as that of the average complexity of the classical
algorithm.
5
Figure 6. KRAB Algorithm (Average Case)
For the average case, we have assumed that some of the function calls are either recursive or non-nested while others
have nested function calls. Under such cases, the algorithm exhibits comparatively better complexity. Suppose there
are “log n” instances which have nested method calls out of a possible “n” method calls, then we will have a
complexity of O(n log n). This is an excellent improvement over the classical algorithm as we can see through the
graph. If we have 1000 function calls, it requires more than one million iterations for the classical algorithm, while it
requires only 6900 iterations for the KRAB algorithm.
Figure 7. KRAB Algorithm (Best Case)
The best case assumes that every function call is either a recursive call or a non-nested function call. KRAB algorithm
has a best case complexity of O(n) which is a huge improvement since the No. of iterations is equal to the No. of
Function Calls required.
V. CONCLUSION
In this paper we have proposed our algorithm for incremental call graph analysis and also analyzed its efficiency over
the classical algorithm. The KRAB Algorithm has many advantages over the classical algorithms as well as some
drawbacks. The usage of stacks in the KRAB algorithm makes it easy to keep track of the program controls. The time
required to compute the predecessor of the current node is less and is equivalent to the time required for popping the
stack. It works better and more efficiently on programs which has recursive method calls. The time complexity of the
KRAB algorithm is better than the classical algorithm in all possible cases. It is capable of identifying any skips that
may occur during the execution of the program. If it occurs, the stack won’t be empty after the last pop operation,
thereby, throwing an exception to the coder. Even though KRAB algorithm is more efficient that the classical
algorithm, it requires an extra space complexity of ‘n’ during its first iteration for the temporary stack. The
performance reduces when virtual functions are used in the programs. Another limitation of the KRAB algorithm is
that it doesn’t support Auto Boxing of the function calls on primitive types. Class Hierarchy technique requires the
classes to be named. If anonymous inner classes are used in the code, then handling them becomes a tedious task.
6
VI. FUTURE WORK
This algorithm is still in its infancy and requires huge number of improvements. This algorithm has been improved to
extend the support for virtual functions and auto-boxing. The extension to virtual functions can be achieved by
integrating the Hierarchical Analysis with Fast Static Analysis. Fast Static Analysis provides excellent results if virtual
functions are present. Hence, a hybrid of these two Analytical methods will prove to be helpful in improving the
stability of the algorithm. Auto Boxing is used by most of the programmers since it’s easy to use and reduces the lines
of code. Therefore, compatibility for Auto Boxing is an essential point that would be taken care of during the
improvement.
REFERENCES
[1] D. Grove and C. Chambers, "A Framework for Call Graph Construction Algorithms", ACM Transactions, ACM, Vol. 3, No 6, November
2001
[2] A. Srivastava, "Unreachable Procedures in Object-oriented Programming", WRL Research Report, August 1993
[3] J. Dean, D. Grove and C. Chambers, "Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis", 1994
[4] D. F. Bacon and P. F. Sweeney, "Fast Static Analysis of C++ Virtual Function Calls", ACM Conference, ACM, October 1996
[5] U. Ismail, "Incremental Call Graph Construction for the Eclipse IDE", University of Waterloo Technical Report, 2009
[6] W. Zhang and B. Ryder, "Constructing Accurate Application Call Graphs For Java To Model Library Callbacks"
[7] F. Tip and J. Palsberg, "Scalable Propagation-based Call Graph Construction Algorithms"