• Home
  • IBM
  • Programming Languages and Software Engineering
  • David Grove
David Grove

David Grove
IBM · Programming Languages and Software Engineering

About

103
Publications
22,466
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,581
Citations

Publications

Publications (103)
Article
Cloud computing has made the resources needed to execute large-scale in-memory distributed computations widely available. Specialized programming models, e.g., MapReduce, have emerged to offer transparent fault tolerance and fault recovery for specific computational patterns, but they sacrifice generality. In contrast, the Resilient X10 programming...
Technical Report
Full-text available
Cloud computing has made the resources needed to execute large-scale in-memory distributed computations widely available. Specialized programming models, e.g., MapReduce, have emerged to offer transparent fault tolerance and fault recovery for specific computational patterns, but they sacrifice generality. In contrast, the Resilient X10 programming...
Conference Paper
Full-text available
Many PGAS languages and libraries rely on high performance transport layers such as GASNet and MPI to achieve low communication latency, portability and scalability. As systems increase in scale, failures are expected to become normal events rather than exceptions. Unfortunately, GASNet and standard MPI do not provide fault tolerance capabilities....
Conference Paper
Event-processing systems can support high-quality reactions to events by providing context to the event agents. When this context consists of a large amount of data, it helps to train an analytic model for it. In a continuously running solution, this model must be kept up-to-date, otherwise quality degrades. Unfortunately, ripple-through effects ma...
Conference Paper
Full-text available
The Asynchronous Partitioned Global Address Space (APGAS) programming model enables programmers to express the parallelism and locality necessary for high performance scientific applications on extreme-scale systems. We used the well-known LULESH hydrodynamics proxy application to explore the performance and programmability of the APGAS model as ex...
Conference Paper
X10 is a Java-like programming language that introduces new constructs to significantly simplify scale-out programming based on the Asynchronous Partitioned Global Address Space (APGAS) programming model. The fundamental goal of X10 is to enable scalable, high-performance, high-productivity programming of large scale computer systems for both conve...
Conference Paper
Full-text available
X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes. We demonstrate that X10...
Article
This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy t...
Conference Paper
Full-text available
Effective support for array-based programming has long been one of the central design concerns of the X10 programming language. After significant research and exploration, X10 has adopted an approach based on providing arrays via user definable and extensible class libraries. This paper surveys the range of array abstractions available to the progr...
Conference Paper
This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy t...
Conference Paper
Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. The advent of Map Reduce, Resilient Data Sets and MillWheel has shown dramatic improvements in productivity are possible when a high-level programming framew...
Article
Full-text available
Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. The advent of Map Reduce, Resilient Data Sets and MillWheel has shown dramatic improvements in productivity are possible when a high-level programming framew...
Article
Full-text available
We present GLB, a programming model and an associated implementation that can handle a wide range of irregular parallel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intr...
Conference Paper
The ability to smoothly interoperate with other programming languages is an essential feature to reduce the barriers to adoption for new languages such as X10. Compiler-supported interoperability between Managed X10 and Java was initially previewed in X10 version 2.2.2 and is now fully supported in X10 version 2.3. In this paper we describe and mot...
Conference Paper
This talk will present a high-level introduction to the X10 language and its implementation with the goal of ensuring a common base knowledge of X10 by all workshop attendees. The X10 language will be introduced primarily via code examples with a focus on X10's support for concurrency and distribution via the APGAS (Asynchronous Partitioned Global...
Patent
Full-text available
Techniques are disclosed for schedule management. By way of example, a method for managing performance of tasks of a thread associated with a processor comprises the following steps. A request to execute a task of a first task type within the thread is received. A determination is made whether the processor is currently executing a critical section...
Article
Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function-level parallelism that targets productivity. StarSs deploys a data-flow model: it analyzes dependencies between tas...
Conference Paper
This paper proposes two novel techniques for partial inlining. Context-driven partial inlining uses information available to the compiler at a call site to prune the callee prior to assessing whether the (pruned) body of the callee should be inlined. Guarded partial inlining seeks to inline the frequently taken fast path through the callee along wi...
Conference Paper
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. Prior work has demonstrated that work...
Conference Paper
We propose a framework for SAT researchers to conveniently try out new ideas in the context of parallel SAT solving without the burden of dealing with all the underlying system issues that arise when implementing a massively parallel algorithm. The framework is based on the parallel execution language X10, and allows the parallel solver to easily r...
Conference Paper
On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS). There are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously s...
Conference Paper
Full-text available
X10 is an emerging Partitioned Global Address Space (PGAS) language intended to increase significantly the productivity of developing scalable HPC applications. The language has now matured to a point where it is meaningful to consider writing large scale scientific application codes in X10. This paper reports our experiences writing three codes fr...
Conference Paper
Full-text available
X10 is a new object-oriented PGAS (Partitioned Global Address Space) programming language with support for distributed asynchronous dynamic parallelism that goes beyond past SPMD message-passing models such as MPI and SPMD PGAS models such as UPC and Co-Array Fortran. The concurrency constructs in X10 make it possible to express complex computation...
Article
To reliably write high performance code in any programming language, an application programmer must have some understanding of the performance characteristics of the language's core constructs. We call this understanding a performance model for the language. Some aspects of a performance model are fundamental to the programming language and are exp...
Conference Paper
Full-text available
Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming while not hindering application performance. StarSs is a family of parallel programming models based on automatic function level parallelism that targets productivity. StarSs deploys a data-flow model: it analyses dependencies between t...
Article
The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems f...
Conference Paper
Full-text available
The power of high-level languages lies in their abstraction over hardware and software complexity, leading to greater security, bet- ter reliability, and lower development costs. However, opaque ab- stractions are often show-stoppers for systems programmers, forc- ing them to either break the abstraction, or more often, simply give up and use a dif...
Conference Paper
As computer hardware continues to dramatically improve in transistor density and raw capability, the importance of compilers to bridge the gap between high-level programming languages and these abundant hardware resources has never been greater. The workshop on Compiler-Driven Performance provided an important opportunity for academic faculty, stud...
Article
Full-text available
Programs encounter increasingly complex and fragile mappings to computing platforms, resulting in performance characteristics that are often mysterious to students, practitioners, and even researchers. We discuss some steps toward an experimental methodology that demands and provides a deep understanding of complete systems, the necessary instrumen...
Conference Paper
Full-text available
Real-time Garbage Collection (RTGC) has recently advanced to the point where it is being used in production for financial trading, mil- itary command-and-control, and telecommunications. However, among potential users of RTGC, there is enormous diversity in both application requirements and deployment environments. Previously described RTGCs tend t...
Conference Paper
While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads. Since frequent allocation is often correlated with a high-level, object-oriented style of programming, this can force builde...
Conference Paper
While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads. Since frequent allocation is often correlated with a high-level, objectoriented style of programming, this can force builder...
Conference Paper
Full-text available
If the operating system could be specialized for every application, many applications would run faster. For example, Java virtual ma- chines (JVMs) provide their own threading model and memory protection, so general-purpose operating system implementations of these abstractions are redundant. However, traditional means of transforming existing syst...
Conference Paper
Full-text available
The emergence of standards for programming real-time systems in Java has encouraged many developers to consider its use for sys- tems previously only built using C, Ada, or assembly language. However, the RTSJ standard in isolation leaves many important problems unaddressed, and suffers from some serious problems in usability and safety. As a resul...
Conference Paper
Debugging the timing behavior of real-time systems is notoriously difficult, and with a new generation of complex systems consisting of tens of millions of lines of code, the difficulty is increasing enormously. We have developed TuningFork, a tool especially designed for visualization and analysis of large-scale real-time systems. TuningFork is ca...
Article
While real-time garbage collection has achieved worst-case latencies on the order of a millisecond, this technology is approaching its practical limits. For tasks requiring extremely low latency, and especially periodic tasks with frequencies above 1 KHz, Java programmers must currently resort to the NoHeapRealtimeThread construct of the Real-Time...
Conference Paper
Full-text available
TuningFork is an online, scriptable data visualization and analysis tool that supports the development and continuous monitor-ing of real-time systems. While TuningFork was originally designed and tested for use with a particular real-time Java Virtual Machine, the ar-chitecture has been designed from the ground up for extensibility by leveraging t...
Conference Paper
Full-text available
While real-time garbage collection has achieved worst-case laten- cies on the order of a millisecond, this technology is approaching its practical limits. For tasks requiring extremely low latency, and especially periodic tasks with frequencies above 1 KHz, Java pro- grammers must currently resort to the NoHeapRealtimeThread construct of the Real-T...
Conference Paper
Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that elim- inating all instruction cache misses improves performance by as much as 16% for a modestly sized instruction cache. In this pa- per, we show how to take advantage of dynamic code generation in a Java Virtual Machine (...
Article
Full-text available
TuningFork is an online, scriptable data visualization and analysis tool that supports the development and continuous monitor- ing of real-time systems. While TuningFork was originally designed and tested for use with a particular real-time Java Virtual Machine, the ar- chitecture has been designed from the ground up for extensibility by leveraging...
Conference Paper
Full-text available
There are many algorithms for concurrent garbage collection, but they are complex to describe, verify, and implement. This has resulted in a poor under- standing of the relationships between the algorithms, and has precluded system- atic study and comparative evaluation. We present a single high-level, abstract concurrent garbage collection algorit...
Conference Paper
Full-text available
Real-time garbage collection has been shown to be feasible, but for programs with high allocation rates, the utilization achievable is not sufficient for some systems.Since a high allocation rate is often correlated with a more high-level, abstract programming style, the ability to provide good real-time performance for such programs will help cont...
Conference Paper
Due to the high dynamic frequency of virtual method calls in typical object-oriented programs, feedback-directed devirtualization and inlining is one of the most important optimizations performed by high-performance virtual machines. A critical input to effective feedback-directed inlining is an accurate dynamic call graph. In a virtual machine, th...
Article
Full-text available
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular program representatio...
Article
Full-text available
This paper describes the evolution of the Jikes™ Research Virtual Machine project from an IBM internal research project, called Jalapeño, into an open-source project. After summarizing the original goals of the project, we discuss the motivation for releasing it as an open-source project and the activities performed to ensure the success of the pro...
Conference Paper
Full-text available
Real-time systems have reached a level of complexity beyond the scaling capability of the low-level or restricted languages traditionally used for real-time programming.While Metronome garbage collection has made it practical to use Java to implement real-time systems, many challenges remain for the construction of complex real-time systems, some s...
Conference Paper
Real-time garbage collection has been shown to be feasible, but for programs with high allocation rates, the utilization achievable is not sufficient for some systems.Since a high allocation rate is often correlated with a more high-level, abstract programming style, the ability to provide good real-time performance for such programs will help cont...
Article
Full-text available
Modern Java programs, such as middleware and application servers, include many complex software components. Improving the performance of these Java applications requires a better understanding of the interactions between the application, virtual machine, operating system, and architecture. Hardware performance monitors, which are available on most...
Conference Paper
Full-text available
Security concerns on embedded devices like cellular phones make Java an extremely attractive technology for providing third-party and user-downloadable functionality. However, garbage collectors have typically required several times the maximum live data set size (which is the minimum possible heap size) in order to run well. In addition, the size...
Article
While Java provides many software engineering benefits, it lacks a coherent module system and instead provides only packages (which are primarily a name space mechanism) and classloaders (which are very low-level). As a result, large Java applications suffer from unexpected interactions between independent components, require complex CLASSPATH defi...
Article
While Java provides many software engineering benefits, it lacks a coherent module system and instead provides only packages (which are primarily a name space mechanism) and classloaders (which are very low-level). As a result, large Java applications suffer from unexpected interactions between independent components, require complex CLASSPATH defi...
Conference Paper
While Java provides many software engineering benefits, it lacks a coherent module system and instead provides only packages (which are primarily a name space mechanism) and classloaders (which are very low-level). As a result, large Java applications suffer from unexpected interactions between independent components, require complex CLASSPATH defi...
Article
As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size. The state-...
Article
Full-text available
Modern Java programs such as middleware and application servers include many complex software components. Improving the performance of these Java applications requires a better understanding of the interactions between the application, virtual machine, operating system, and architecture. Hardware performance monitors, which are available on most mo...
Conference Paper
As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size. The stat...
Article
While many object-oriented languages impose space overhead of only one word per object to support features like virtual method dispatch, Java's richer functionality has led to implementations that require two or three header words per object. This space overhead increases memory usage and attendant garbage collection costs, reduces cache locality,...
Article
porting the Jikes Research Virtual Machine from its first platform, AIX/PowerPC, to its second, Linux/IA32. We discuss the main is- sues in realizing both an initial functional port, and then tuning efforts to achieve competitive performance. The paper presents software engineering issues in building a portable runtime system and compilers, as well...
Article
Full-text available
Dataflow analyses can have mutually beneficial interactions. Previous e#orts to exploit these interactions have either (1) iteratively performed each individual analysis until no further improvements are discovered or (2) developed "superanalyses " that manually combine conceptually separate analyses. We have devised a new approach that allows anal...
Conference Paper
Full-text available
While many object-oriented languages impose space overhead of only one word per object to support features like virtual method dispatch, Java’s richer functionality has led to implementations that require two or three header words per object. This space overhead increases memory usage and attendant garbage collection costs, reduces cache locality,...
Article
Full-text available
A Java virtual machine (JVM) must sometimes check whether a value of one type can be can be treated as a value of another type. The overhead for such dynamic type checking can be a significant factor in the running time of some Java programs. This paper presents a variety of techniques for performing these checks, each tailored to a particular rest...
Article
Full-text available
A Java virtual machine (JVM) must sometimes check whether a value of one type can be can be treated as a value of another type. The overhead for such dynamic type checking can be a signi cant factor in the running time of some Java programs. This paper presents a variety of techniques for performing these checks, each tailored to a particular restr...
Article
A large number of call graph construction algorithms for object-oriented and functional languages have been proposed, each embodying different tradeoffs between analysis cost and call graph preci-sion. In this article we present a unifying framework for understanding call graph construction algo-rithms and an empirical comparison of a representativ...
Article
Single superclass inheritance enables simple and efficient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces, This complication has led to a widespread misimpression that interface method dispatch is inherently inefficient. This paper argues that with proper implementati...
Article
Full-text available
The Jalape~no Dynamic Optimizing Compiler is a key component of the Jalape~no Virtual Machine, a new Java 1 Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the Jalape~no Optimizing Compiler, and the implementation results that we have obtained...
Article
Full-text available
Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. This paper argues that with proper implementation tec...
Article
This paper provides details of the component of the Jalape~no adaptive optimization system that determines what methods to optimize. This component, called the controller , can choose from one of several optimization levels. In the current implementation, the controller uses a simple cost/benet analysis to drive adaptive compilation decisions. It h...
Article
The execution model for mobile, dynamically‐linked, object‐oriented programs has evolved from fast interpretation to a mix of interpreted and dynamically compiled execution. The primary motivation for dynamic compilation is that compiled code executes significantly faster than interpreted code. However, dynamic compilation, which is performed while...
Conference Paper
Virtual methods can be dispatched efficiently because the code for corresponding methods reside at the same entries in their respective virtual method tables (VMTs). To achieve efficient interface method dispatch, a fixed-sized interface method table (IMT) is associated with each class. Different implementations of the same interface method signatu...
Conference Paper
Full-text available
In this paper, we report on our experiences with guaranteeing GC-pointer safety when using unsafe low-level language extensions to implement a JVM in Java. We give an overview of the original unsafe language extensions that were defined for use by Jalapeño implementers, and introduce sanitized replacements that capture common idioms while also guar...
Conference Paper
Future high-performance virtual machines will improve performance through sophisticated online feedback-directed optimizations. this paper presents the architecture of the Jalapeño Adaptive Optimization System, a system to support leading-edge virtual machine technology and enable ongoing research on online feedback-directed optimizations. We descr...
Article
Full-text available
Future high-performance virtual machines will improve performance through sophisticated online feedback-directed optimizations. This paper presents the architecture of the Jalapeño Adaptive Optimization System, a system to support leading-edge virtual machine technology and enable ongoing research on online feedback-directed optimizations. We descr...
Article
Full-text available
Abstract A large number of call graph construction algorithms for object-oriented and functional languages have been proposed, each embodying different tradeoffs between analysis cost and call graph precision. In this paper, we present a unifying framework,for understanding call graph construction algorithms and an empirical comparison,of a represe...
Article
Full-text available
Jalapeño is a virtual machine for Java™ servers written in the Java language. To be able to address the requirements of servers (performance and scalability in particular), Jalapeño was designed “from scratch“ to be as self-sufficient as possible. Jalapeño's unique object model and memory layout allows a hardware null-pointer check as well as fast...
Conference Paper
Full-text available
The Educators' Symposium is a unique forum for educators from both academia and industry who have a vested interest in OO education and training. The Educators' Symposium facilitates the exchange of ideas in a number of ways, including featured talks ...
Article
The runtime performance of object-oriented languages often suffers due to the overhead of dynamic dispatching. In order to make these languages competitive with traditional languages, optimizing compilers attempt to eliminate as many of the dynamic dispatches as possible. A variety of local and intraprocedural techniques have been developed to do t...
Article
Full-text available
The Factored Control Flow Graph, FCFG, is a novel representation of a program's intraprocedural control flow, which is designed to efficiently support the analysis of programs written in languages, such as Java, that have frequently occurring operations whose execution may result in exceptional control flow. The FCFG is more compact than traditiona...
Article
Full-text available
. Optimizing compilers for object-oriented languages apply static class analysis and other techniques to try to deduce precise information about the possible classes of the receivers of messages; if successful, dynamicallydispatched messages can be replaced with direct procedure calls and potentially further optimized through inline-expansion. By e...
Article
Full-text available
Previously, techniques such as class hierarchy analysis and profile-guided receiver class prediction have been demonstrated to greatly improve the performance of applications written in pure object-oriented languages, but the degree to which these results are transferable to applications written in hybrid languages has been unclear. In part to answ...
Article
Full-text available
Previous algorithms for interprocedural control flow analysis of higher-order and/or object-oriented languages have been described that perform propagation or constraint satisfaction and take O(N 3 ) time (such as Shivers's 0-CFA and Heintze's set-based analysis), or unification and take O(Na(N,N)) time (such as Steensgaard's pointer analysis), or...
Article
Full-text available
The Jalape~no Dynamic Optimizing Compiler is a key component of the Jalape~no Virtual Machine, a new Java 1 Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the Jalape~no Optimizing Compiler, and the implementation results that we have obtained...
Conference Paper
Full-text available
Interprocedural analyses enable optimizing compilers to more precisely model the effects of non-inlined procedure calls, potentially resulting in substantial increases in application performance. Applying interprocedural analysis to programs written in object-oriented or functional languages is complicated by the difficulty of constructing an accur...
Article
Full-text available
Because dataflow analyses are difficult to implement from scratch, reusable dataflow analysis frameworks have been developed which provide generic support facilities for managing propagation of dataflow information and iteration in loops. We have designed a framework that improves on previous work by making it easy to perform graph transformations...
Article
Previously, techniques such as class hierarchy analysis and profile-guided receiver class prediction have been demonstrated to greatly improve the performance of applications written in pure object-oriented languages, but the degree to which these results are transferable to applications written in hybrid languages has been unclear. In part to answ...
Article
Full-text available
We describe Vortex, an optimizing compiler intended to produce high-quality code for programs written in a heavily-object-oriented style. To achieve this end, Vortex includes a number of intra- and interprocedural static analyses that can exploit knowledge about the whole program being compiled, including intraprocedural class analysis, class hiera...