Fig 4 - uploaded by Manuel Fähndrich
Content may be subject to copyright.
Source publication
Modern programming environments provide extensive support for inspecting,
analyzing, and testing programs based on the algorithmic structure of a
program. Unfortunately, support for inspecting and understanding runtime data
structures during execution is typically much more limited. This paper provides
a general purpose technique for abstracting an...
Contexts in source publication
Context 1
... #19 represents a large amount of memory, ∼107K objects representing nearly half of the the total live heap. By expanding the node #19 we get the graph (Figure 4) representing the internal struc- ture of the dominator reduced node. This reveals node ($48), abstracting a region of ∼18K Face objects, node ($23), ab- stracting a region of ∼18K Point[], and node ($49), abstracting a region of ∼72K Point objects. ...
Context 2
... first glance it may not be clear how to reduce the overhead of these Point objects. However, turning on the over-factored highlighting or inspecting the injectivity information in Figure 4, provides additional guidance. The edge from node $23 to node $49-representing all the pointers stored in the arrays-is shown as a normal edge and not shaded and wide. ...
Similar publications
Computational thinking represents a collection of structured problem solving skills that cross-cut educational disciplines. There is significant future value in introducing these skills as early as practical in students' academic careers. Over the past three years, we have developed, piloted, and evaluated a series of K-12 outreach modules designed...
Many social scientists conduct research on increasing the participation of women in computing, yet it is computer scientists who must find ways of implementing those findings into concrete actions. Technology for Community is an undergraduate computer science course taught at the University of Colorado at Boulder in which students work with local c...
Citations
... Some tools support users by displaying the path to the GC roots, while other approaches assist users by displaying the code that has allocated the objects. Visualization approaches [2,46,73,76,97,102,131] aggregating the object graph (e.g., based on its dominator tree [69,75,116]) are useful to analyze the heap's composition. A user following the top-down approach first selects a GC root or a heap object that keeps alive many other objects. ...
... Most state-of-the-art tools rely heavily on the visualization of data using (tree) tables. Yet, ample scientific work exists on more advanced features for memory visualization [2,46,73,76,97,102,131]. AntTracks thus already provides a graph-based visualization of the aggregated object graph to inspect the paths to the GC roots. ...
Memory analysis tools are essential for finding and fixing anomalies in the memory usage of software systems (e.g., memory leaks). Although numerous tools are available, hardly any empirical studies exist on their usefulness for developers in typical usage scenarios. Instead, most evaluations are limited to reporting performance metrics. We thus conducted a study to empirically assess the usefulness of the interactive memory analysis tool AntTracks Analyzer. Specifically, we first report findings from assessing the tool using a cognitive walkthrough, guided by the Cognitive Dimensions of Notations Framework. We then present the results of a qualitative user study involving 14 subjects who used AntTracks to detect and resolve memory anomalies. We report lessons learned from the study and implications for developers of interactive memory analysis tools. We hope that our results will help researchers and developers of memory analysis tools in defining, selecting, and improving tool capabilities.
... We use runtime sampling to produce a range of statistically meaningful measurements of the heaps produced by the DaCapo [4] benchmark, a well-known benchmark that encompasses a real-world object-oriented programs, selected to represent a range of application domains and programming idioms;-We provide evidence that components , as described in [18,21], usefully and closely correspond to the roles that developer assign to objects; and-We identify a small number of idiomatic sharing patterns that describe the majority of sharing that occurs in practice. ...
... In this work, we hypothesize that developers often think of objects in terms of the roles they play in the programs [21]. These roles implicitly aggregate objects into conceptually related sets. ...
... We formalize what "stored together" means in Section 3. We use these two structural indistinguishability principles to partition the concrete heap into conceptual components. Figure 1 shows the output of HeapDbg [21], a conceptual component visualization tool; it illustrates how we apply these principles to the concrete heap of a program that manipulates arithmetic expression trees. Figure 1(a) shows a concrete heap snapshot as computed by the sampling framework for a simple program that manipulates expression trees. ...
A large gap exists between the wide range of admissible heap structures and those that programmers actually build. To understand this gap, we empirically study heap structures and their sharing relations in real-world programs. Our goal is to characterize these heaps. Our study rests on a heap abstraction that uses structural indistinguishability principles to group objects that play the same role. Our results shed light on prevalence of recursive data-structures, aggregation, and the sharing patterns that occur in programs. We find, for example, that real-world heaps are dominated by atomic shapes (79% on average) and the majority of sharing occurs via common programming idioms. In short, the heap is, in practice, a simple structure constructed out of a small number of simple structures. Our findings imply that garbage collection and program analysis may achieve a high return by focusing on simple heap structures.
... Our benchmark suite consists primarily of direct C# ports of commonly used Java benchmarks from Jolden [9], the db and raytracer programs from SPEC JVM98 [28], and the luindex and lusearch programs from the DaCapo suite [9]. Additionally we have analyzed the heap abstraction code, runabs, from [19]. In practice we translate the .Net assemblies to a simplified IR (intermediate representation) which allows us to remove most .Net specific idioms from the core analysis and to perform specialized analyses on the standard libraries [20]. ...
... -The introduction of thread-level parallelization, as in [18], to obtain a 3× speedup for bh on our quad-core machine. -Data structure reorganization to improve memory usage and automated leak detection, as in [19], to obtain over a 25% reduction in live memory in raytrace. -The computation of ownership information for object fields, as in [14], identifying ownership properties for 22% of the fields and unique memory reference properties for 47% of the fields in lusearch. ...
The computational cost and precision of a shape style heap analysis is highly dependent on the way method calls are handled. This paper introduces a new approach to analyzing method calls that leverages the fundamental object-oriented programming concepts of encapsulation and invariants. The analysis consists of a novel partial context-sensitivity heuristic and a new take on cutpoints that, in practice, provide large improvements in interprocedural analysis performance while having minimal impacts on the precision of the results.
The interprocedural analysis has been implemented for .Net bytecode and an existing abstract heap model. Using this implementation we evaluate both the runtime cost and the precision of the results on a number of well known benchmarks and real-world programs. Our experimental evaluations show that, despite the use of partial context sensitivity heuristics, the static analysis is able to precisely approximate the ideal analysis results. Further, the results show that the interprocedural analysis heuristics and the approach to cutpoints used in this work are critical in enabling the analysis of large real-world programs, over 30K bytecodes in less than 65 seconds and using less than 130 MB of memory, and which could not be analyzed with previous approaches.
... A major challenge in constructing such a hybrid analysis is the question of: Are strong updates a fundamental component of a shape style analysis or is it possible to compute precise shape, sharing, etc. information with an analysis that uses simpler and more efficient points-to style transfer functions? Recent empirical work on the structure and behavior of the heap in modern objectoriented programs has shed light on how heap structures are constructed [41,3], the configuration of the pointers and objects in them [5] , and their invariant structural proper- ties [31,2,1]. These results affirm several common assumptions about how object-oriented programs are designed and how the heap structures in them behave. ...
... In particular [41,3,5] demonstrate that object-oriented programs exhibit extensive mostly-functional behaviors: making extensive use of final (or quiescing) fields, stationary fields, copy construction, and when fields are updated the new target is frequently a newer (often freshly allocated ) object. The results in [31,2,1] provide insight into what heuristics can be used to effectively group sections of the heap based on how they are used in the program, how prevalent the use of library containers is, and what sorts structures are built. The results show that in practice object-oriented programs tend to organize objects on the heap into well defined groups based on their roles in the program, they avoid the use of linked pointer structures in favor or library provided containers, and that connectivity and sharing properties between groups of objects are relatively simple and stable throughout the execution of the program. ...
... The normal form leverages the idea that locally (within a basic block or method call) invariants can be broken and subtle details are critical to program behavior but before/after these local components invariants should be restored. The basis for the normal form, and the selection of what are important properties to preserve, comes from studies of the runtime heap structures produced in object-oriented programs [31,2]. Thus we know that, in general, these definitions are well suited to capturing the fundamental structural properties of the heap that are of interest while simplifying the structure of abstract heaps and discarding superfluous details. ...
This paper introduces a new hybrid memory analysis, Structural Analysis,
which combines an expressive shape analysis style abstract domain with
efficient and simple points-to style transfer functions. Using data from
empirical studies on the runtime heap structures and the programmatic idioms
used in modern object-oriented languages we construct a heap analysis with the
following characteristics: (1) it can express a rich set of structural, shape,
and sharing properties which are not provided by a classic points-to analysis
and that are useful for optimization and error detection applications (2) it
uses efficient, weakly-updating, set-based transfer functions which enable the
analysis to be more robust and scalable than a shape analysis and (3) it can be
used as the basis for a scalable interprocedural analysis that produces precise
results in practice.
The analysis has been implemented for .Net bytecode and using this
implementation we evaluate both the runtime cost and the precision of the
results on a number of well known benchmarks and real world programs. Our
experimental evaluations show that the domain defined in this paper is capable
of precisely expressing the majority of the connectivity, shape, and sharing
properties that occur in practice and, despite the use of weak updates, the
static analysis is able to precisely approximate the ideal results. The
analysis is capable of analyzing large real-world programs (over 30K bytecodes)
in less than 65 seconds and using less than 130MB of memory. In summary this
work presents a new type of memory analysis that advances the state of the art
with respect to expressive power, precision, and scalability and represents a
new area of study on the relationships between and combination of concepts from
shape and points-to analyses.
Understanding and optimizing memory usage of software applications is a difficult task, usually involving the analysis of large amounts of memory-related complex data. Over the years, numerous software visualizations have been proposed to help developers analyze the memory usage information of their programs.
This article reports a systematic literature review of published works centered on software visualizations for analyzing the memory consumption of programs. We have systematically selected 46 articles and categorized them based on the tasks supported, data collected, visualization techniques, evaluations conducted, and prototype availability. As a result, we introduce a taxonomy based on these five dimensions to identify the main challenges of visualizing memory consumption and opportunities for improvement. Despite the effort to evaluate visualizations, we also find that most articles lack evidence regarding how these visualizations perform in practice. We also highlight that few articles are available for developers willing to adopt a visualization for memory consumption analysis. Additionally, we describe a number of research areas that are worth exploring.
Knowing the shapes of dynamic data structures is key when formally reasoning about pointer programs. While modern shape analysis tools employ symbolic execution and machine learning to infer shapes, they often assume well-structured C code or programs written in an idealised language. In contrast, our Data Structure Investigator (DSI) tool for program comprehension analyses concrete executions and handles even C programs with complex coding styles. Our current research on memory safety develops ways for DSI to synthesise inductive shape predicates in separation logic. In the context of trusted computing, we investigate how the inferred predicates can be employed to generate runtime checks for securely communicating dynamic data structures across trust boundaries. We also explore to what extent these predicates, together with additional information extracted by DSI, can be used within general program verifiers such as VeriFast. This paper accompanies a talk at the ISoLA 2018 track “A Broader View on Verification: From Static to Runtime and Back”. It introduces DSI, highlights the above use cases, and sketches our approach for synthesising inductive shape predicates.
Complex software systems often suffer from performance problems caused by memory anomalies such as memory leaks. While the proliferation of objects is rather easy to detect using state-of-the-art memory monitoring tools, extracting a leak's root cause, i.e., identifying the objects that keep the accumulating objects alive, is still poorly supported. Most state-of-the-art tools rely on the dominator tree of the object graph and thus only support single-object ownership analysis. Multi-object ownership analysis, e.g., when the leaking objects are contained in multiple collections, is not possible by merely relying on the dominator tree. We present an efficient approach to continuously collect GC root information (e.g., static fields or thread-local variables) in a trace-based memory monitoring tool, as well as algorithms that use this information to calculate the transitive closure (i.e., all reachable objects) and the GC closure (i.e., objects that are kept alive) for arbitrary heap object groups. These closures allow to derive various metrics for heap object groups that can be used to guide the user during memory leak analysis. We implemented our approach in AntTracks, an offline memory monitoring tool, and demonstrate its usefulness by comparing it with other widely used tools for memory leak detection such as the Eclipse Memory Analyzer. Our evaluation shows that collecting GC root information tracing introduces about 1% overhead, in terms of run time as well as trace file size.
Programs written in modern object-oriented programming languages heavily use dynamically allocated objects in the heap. Therefore, dynamic program analysis techniques, such as memory leak diagnosing and automatic debugging, depend on various kinds of information derived from the heap. Identifying the differences between two heaps is one of the most important task and provided by many free and commercial problem diagnosing tools that are widely used by industry. However, existing heap differentiating tools usually leverage singular kind of information of an object, e.g., the address, allocation site or access path in the heap object graph. Such a single kind of information usually has disadvantages and thus can only provide an imprecise result, which cannot further satisfy the requirement of other high-level dynamic analysis. We have observed that the disadvantage of a kind of information can be remedied by another one in many situations. This paper presents PHD, a precise heap differentiating tool for Java programs, using objects’ spatial information (i.e., access path) and temporal information (i.e., execution index), which are both derived from the execution. To practically collect execution index, we implemented PHD on an industrial-strength Java virtual machine and thus it can be seamlessly integrated in production environments. Furthermore, we conducted case studies using PHD for three different dynamic analysis tasks on real-world applications such as Eclipse Compiler for Java, Apache Derby and Apache FTP Server.