Fig 1 - uploaded by Manuel Fähndrich
Content may be subject to copyright.
A concrete heap and corresponding abstraction. 

A concrete heap and corresponding abstraction. 

Source publication
Article
Full-text available
Modern programming environments provide extensive support for inspecting, analyzing, and testing programs based on the algorithmic structure of a program. Unfortunately, support for inspecting and understanding runtime data structures during execution is typically much more limited. This paper provides a general purpose technique for abstracting an...

Similar publications

Article
Full-text available
Computational thinking represents a collection of structured problem solving skills that cross-cut educational disciplines. There is significant future value in introducing these skills as early as practical in students' academic careers. Over the past three years, we have developed, piloted, and evaluated a series of K-12 outreach modules designed...
Article
Full-text available
Many social scientists conduct research on increasing the participation of women in computing, yet it is computer scientists who must find ways of implementing those findings into concrete actions. Technology for Community is an undergraduate computer science course taught at the University of Colorado at Boulder in which students work with local c...

Citations

... Some tools support users by displaying the path to the GC roots, while other approaches assist users by displaying the code that has allocated the objects. Visualization approaches [2,46,73,76,97,102,131] aggregating the object graph (e.g., based on its dominator tree [69,75,116]) are useful to analyze the heap's composition. A user following the top-down approach first selects a GC root or a heap object that keeps alive many other objects. ...
... Most state-of-the-art tools rely heavily on the visualization of data using (tree) tables. Yet, ample scientific work exists on more advanced features for memory visualization [2,46,73,76,97,102,131]. AntTracks thus already provides a graph-based visualization of the aggregated object graph to inspect the paths to the GC roots. ...
Article
Memory analysis tools are essential for finding and fixing anomalies in the memory usage of software systems (e.g., memory leaks). Although numerous tools are available, hardly any empirical studies exist on their usefulness for developers in typical usage scenarios. Instead, most evaluations are limited to reporting performance metrics. We thus conducted a study to empirically assess the usefulness of the interactive memory analysis tool AntTracks Analyzer. Specifically, we first report findings from assessing the tool using a cognitive walkthrough, guided by the Cognitive Dimensions of Notations Framework. We then present the results of a qualitative user study involving 14 subjects who used AntTracks to detect and resolve memory anomalies. We report lessons learned from the study and implications for developers of interactive memory analysis tools. We hope that our results will help researchers and developers of memory analysis tools in defining, selecting, and improving tool capabilities.
... We use runtime sampling to produce a range of statistically meaningful measurements of the heaps produced by the DaCapo [4] benchmark, a well-known benchmark that encompasses a real-world object-oriented programs, selected to represent a range of application domains and programming idioms;-We provide evidence that components , as described in [18,21], usefully and closely correspond to the roles that developer assign to objects; and-We identify a small number of idiomatic sharing patterns that describe the majority of sharing that occurs in practice. ...
... In this work, we hypothesize that developers often think of objects in terms of the roles they play in the programs [21]. These roles implicitly aggregate objects into conceptually related sets. ...
... We formalize what "stored together" means in Section 3. We use these two structural indistinguishability principles to partition the concrete heap into conceptual components. Figure 1 shows the output of HeapDbg [21], a conceptual component visualization tool; it illustrates how we apply these principles to the concrete heap of a program that manipulates arithmetic expression trees. Figure 1(a) shows a concrete heap snapshot as computed by the sampling framework for a simple program that manipulates expression trees. ...
Conference Paper
Full-text available
A large gap exists between the wide range of admissible heap structures and those that programmers actually build. To understand this gap, we empirically study heap structures and their sharing relations in real-world programs. Our goal is to characterize these heaps. Our study rests on a heap abstraction that uses structural indistinguishability principles to group objects that play the same role. Our results shed light on prevalence of recursive data-structures, aggregation, and the sharing patterns that occur in programs. We find, for example, that real-world heaps are dominated by atomic shapes (79% on average) and the majority of sharing occurs via common programming idioms. In short, the heap is, in practice, a simple structure constructed out of a small number of simple structures. Our findings imply that garbage collection and program analysis may achieve a high return by focusing on simple heap structures.
... Our benchmark suite consists primarily of direct C# ports of commonly used Java benchmarks from Jolden [9], the db and raytracer programs from SPEC JVM98 [28], and the luindex and lusearch programs from the DaCapo suite [9]. Additionally we have analyzed the heap abstraction code, runabs, from [19]. In practice we translate the .Net assemblies to a simplified IR (intermediate representation) which allows us to remove most .Net specific idioms from the core analysis and to perform specialized analyses on the standard libraries [20]. ...
... -The introduction of thread-level parallelization, as in [18], to obtain a 3× speedup for bh on our quad-core machine. -Data structure reorganization to improve memory usage and automated leak detection, as in [19], to obtain over a 25% reduction in live memory in raytrace. -The computation of ownership information for object fields, as in [14], identifying ownership properties for 22% of the fields and unique memory reference properties for 47% of the fields in lusearch. ...
Conference Paper
The computational cost and precision of a shape style heap analysis is highly dependent on the way method calls are handled. This paper introduces a new approach to analyzing method calls that leverages the fundamental object-oriented programming concepts of encapsulation and invariants. The analysis consists of a novel partial context-sensitivity heuristic and a new take on cutpoints that, in practice, provide large improvements in interprocedural analysis performance while having minimal impacts on the precision of the results. The interprocedural analysis has been implemented for .Net bytecode and an existing abstract heap model. Using this implementation we evaluate both the runtime cost and the precision of the results on a number of well known benchmarks and real-world programs. Our experimental evaluations show that, despite the use of partial context sensitivity heuristics, the static analysis is able to precisely approximate the ideal analysis results. Further, the results show that the interprocedural analysis heuristics and the approach to cutpoints used in this work are critical in enabling the analysis of large real-world programs, over 30K bytecodes in less than 65 seconds and using less than 130 MB of memory, and which could not be analyzed with previous approaches.
... A major challenge in constructing such a hybrid analysis is the question of: Are strong updates a fundamental component of a shape style analysis or is it possible to compute precise shape, sharing, etc. information with an analysis that uses simpler and more efficient points-to style transfer functions? Recent empirical work on the structure and behavior of the heap in modern objectoriented programs has shed light on how heap structures are constructed [41,3], the configuration of the pointers and objects in them [5] , and their invariant structural proper- ties [31,2,1]. These results affirm several common assumptions about how object-oriented programs are designed and how the heap structures in them behave. ...
... In particular [41,3,5] demonstrate that object-oriented programs exhibit extensive mostly-functional behaviors: making extensive use of final (or quiescing) fields, stationary fields, copy construction, and when fields are updated the new target is frequently a newer (often freshly allocated ) object. The results in [31,2,1] provide insight into what heuristics can be used to effectively group sections of the heap based on how they are used in the program, how prevalent the use of library containers is, and what sorts structures are built. The results show that in practice object-oriented programs tend to organize objects on the heap into well defined groups based on their roles in the program, they avoid the use of linked pointer structures in favor or library provided containers, and that connectivity and sharing properties between groups of objects are relatively simple and stable throughout the execution of the program. ...
... The normal form leverages the idea that locally (within a basic block or method call) invariants can be broken and subtle details are critical to program behavior but before/after these local components invariants should be restored. The basis for the normal form, and the selection of what are important properties to preserve, comes from studies of the runtime heap structures produced in object-oriented programs [31,2]. Thus we know that, in general, these definitions are well suited to capturing the fundamental structural properties of the heap that are of interest while simplifying the structure of abstract heaps and discarding superfluous details. ...
Article
This paper introduces a new hybrid memory analysis, Structural Analysis, which combines an expressive shape analysis style abstract domain with efficient and simple points-to style transfer functions. Using data from empirical studies on the runtime heap structures and the programmatic idioms used in modern object-oriented languages we construct a heap analysis with the following characteristics: (1) it can express a rich set of structural, shape, and sharing properties which are not provided by a classic points-to analysis and that are useful for optimization and error detection applications (2) it uses efficient, weakly-updating, set-based transfer functions which enable the analysis to be more robust and scalable than a shape analysis and (3) it can be used as the basis for a scalable interprocedural analysis that produces precise results in practice. The analysis has been implemented for .Net bytecode and using this implementation we evaluate both the runtime cost and the precision of the results on a number of well known benchmarks and real world programs. Our experimental evaluations show that the domain defined in this paper is capable of precisely expressing the majority of the connectivity, shape, and sharing properties that occur in practice and, despite the use of weak updates, the static analysis is able to precisely approximate the ideal results. The analysis is capable of analyzing large real-world programs (over 30K bytecodes) in less than 65 seconds and using less than 130MB of memory. In summary this work presents a new type of memory analysis that advances the state of the art with respect to expressive power, precision, and scalability and represents a new area of study on the relationships between and combination of concepts from shape and points-to analyses.
Chapter
Knowing the shapes of dynamic data structures is key when formally reasoning about pointer programs. While modern shape analysis tools employ symbolic execution and machine learning to infer shapes, they often assume well-structured C code or programs written in an idealised language. In contrast, our Data Structure Investigator (DSI) tool for program comprehension analyses concrete executions and handles even C programs with complex coding styles.
Conference Paper
Complex software systems often suffer from performance problems caused by memory anomalies such as memory leaks. While the proliferation of objects is rather easy to detect using state-of-the-art memory monitoring tools, extracting a leak's root cause, i.e., identifying the objects that keep the accumulating objects alive, is still poorly supported. Most state-of-the-art tools rely on the dominator tree of the object graph and thus only support single-object ownership analysis. Multi-object ownership analysis, e.g., when the leaking objects are contained in multiple collections, is not possible by merely relying on the dominator tree. We present an efficient approach to continuously collect GC root information (e.g., static fields or thread-local variables) in a trace-based memory monitoring tool, as well as algorithms that use this information to calculate the transitive closure (i.e., all reachable objects) and the GC closure (i.e., objects that are kept alive) for arbitrary heap object groups. These closures allow to derive various metrics for heap object groups that can be used to guide the user during memory leak analysis. We implemented our approach in AntTracks, an offline memory monitoring tool, and demonstrate its usefulness by comparing it with other widely used tools for memory leak detection such as the Eclipse Memory Analyzer. Our evaluation shows that collecting GC root information tracing introduces about 1% overhead, in terms of run time as well as trace file size.
Chapter
Programs written in modern object-oriented programming languages heavily use dynamically allocated objects in the heap. Therefore, dynamic program analysis techniques, such as memory leak diagnosing and automatic debugging, depend on various kinds of information derived from the heap. Identifying the differences between two heaps is one of the most important task and provided by many free and commercial problem diagnosing tools that are widely used by industry. However, existing heap differentiating tools usually leverage singular kind of information of an object, e.g., the address, allocation site or access path in the heap object graph. Such a single kind of information usually has disadvantages and thus can only provide an imprecise result, which cannot further satisfy the requirement of other high-level dynamic analysis. We have observed that the disadvantage of a kind of information can be remedied by another one in many situations. This paper presents PHD, a precise heap differentiating tool for Java programs, using objects’ spatial information (i.e., access path) and temporal information (i.e., execution index), which are both derived from the execution. To practically collect execution index, we implemented PHD on an industrial-strength Java virtual machine and thus it can be seamlessly integrated in production environments. Furthermore, we conducted case studies using PHD for three different dynamic analysis tasks on real-world applications such as Eclipse Compiler for Java, Apache Derby and Apache FTP Server.
Conference Paper
Optimizing memory management is a major challenge of embedded systems programming, as memory is scarce. Further, embedded systems often have heterogeneous memory architectures, complicating the task of memory allocation during both compilation and migration. However, new opportunities for addressing these challenges have been created by the recent emergence of managed runtimes for embedded systems. By imposing structure on memory, these systems have opened the doors for new techniques for analyzing and optimizing memory usage within embedded systems. This paper presents GEM (Graphs of Embedded Memory), a tool which capitalizes on the structure that managed runtime systems provide in order to build memory graphs which facilitate memory analysis and optimization. At GEM's core are a set of fundamental graph transformations which can be layered to support a wide range of use cases, including interactive memory visualization, de-duplication of objects and code, compilation for heterogeneous memory architectures, and transparent migration. Moreover, since the same underlying infrastructure supports all of these orthogonal functionalities, they can easily be applied together to complement each other.
Article
Optimizing memory management is a major challenge of embedded systems programming, as memory is scarce. Further, embedded systems often have heterogeneous memory architectures, complicating the task of memory allocation during both compilation and migration. However, new opportunities for addressing these challenges have been created by the recent emergence of managed runtimes for embedded systems. By imposing structure on memory, these systems have opened the doors for new techniques for analyzing and optimizing memory usage within embedded systems. This paper presents GEM (Graphs of Embedded Memory), a tool which capitalizes on the structure that managed runtime systems provide in order to build memory graphs which facilitate memory analysis and optimization. At GEM's core are a set of fundamental graph transformations which can be layered to support a wide range of use cases, including interactive memory visualization, de-duplication of objects and code, compilation for heterogeneous memory architectures, and transparent migration. Moreover, since the same underlying infrastructure supports all of these orthogonal functionalities, they can easily be applied together to complement each other.
Conference Paper
As the complexity of malware grows, so does the necessity of employing program structuring mechanisms during development. While control flow structuring is often obfuscated, the dynamic data structures employed by the program are typically untouched. We report on work in progress that exploits this weakness to identify dynamic data structures present in malware samples for the purposes of aiding reverse engineering and constructing malware signatures, which may be employed for malware classification. Using a prototype implementation, which combines the type recovery tool Howard and the identification tool Data Structure Investigator (DSI), we analyze data structures in Carberp and AgoBot malware. Identifying their data structures illustrates a challenging problem. To tackle this, we propose a new type recovery for binaries based on machine learning, which uses Howard's types to guide the search and DSI's memory abstraction for hypothesis evaluation.