David F. Bacon's research while affiliated with Google Inc. and other places

Publications (125)

Article
An important question in a software economy is how to incentivize deep rather than shallow fixes. A deep fix corrects the root cause of a bug instead of suppressing the symptoms. This paper initiates the study of the problem of incentive design for open workflows in fixing code. We model the dynamics of the software ecosystem and introduce subsumpt...
Conference Paper
Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. The first Spanner paper published at OSDI'12 focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, e...
Patent
The present invention broadly contemplates braids and fibers, high-level programming constructs which facilitate the creation of programs that are partially ordered, to address the continuing trend of ever-increasing processor speeds and attendant increases in memory latencies. These partial orders can be used to respond adaptively to memory latenc...
Patent
Full-text available
A technique for compiling and running high-level program on heterogeneous computers may include partitioning a program code into two or more logical units, and compiling each of the logical units into one or more executable entities. At least some of the logical units are compiled into two or more executable entities, the two or more executable ent...
Patent
Full-text available
A computing device is provided and includes a memory module, a sweep engine, a root snapshot module, and a trace engine. The memory module has a memory implemented as at least one hardware circuit. The memory module uses a dual-ported memory configuration. The sweep engine includes a stack pointer. The sweep engine is configured to send a garbage c...
Patent
Full-text available
Color-based caching allows each cache line to be distinguished by a specific color, and enables the manipulation of cache behavior based upon the colors of the cache lines. When multiple threads are able to share a cache, effective cache management is critical to overall performance. Color-based caching provides an effective method to better utiliz...
Patent
Full-text available
The present invention provides techniques that allow concurrent collection of cyclic garbage on reference counting systems. In general, candidate objects are found that may be part of cyclic garbage. Each candidate object has a reference count. Two tests are performed to determine if concurrent operations have affected the reference counts of the c...
Article
Despite rapid increases in memory capacity, reconfigurable hardware is still programmed in a very low-level manner, generally without any dynamic allocation at all. This limits productivity especially as the larger chips encourage more and more complex designs to be attempted. Prior work has shown that it is possible to implement a real-time collec...
Article
Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use than conventional CPUs. The continued exponential increase in transistors, combined with the desire...
Conference Paper
This paper describes the Liquid Metal entry in the 2013 ICFPT Design Competition. The Liquid Metal system provides a high-level language called Lime and a toolchain targeting FPGAs. Lime allowed us to use standard software development processes for programming, debugging, and performance tuning our FPGA design. We believe such iteration and refinem...
Article
Now that the use of garbage collection in languages like Java is becoming widely accepted due to the safety and software engineering benefits it provides, there is significant interest in applying garbage collection to hard real-time systems. Past approaches have generally suffered from one of two major flaws: either they were not provably real-tim...
Patent
Full-text available
Techniques are disclosed for schedule management. By way of example, a method for managing performance of tasks of a thread associated with a processor comprises the following steps. A request to execute a task of a first task type within the thread is received. A determination is made whether the processor is currently executing a critical section...
Article
The programmability of FPGAs must improve if they are to be part of mainstream computing.
Article
The programmability of FPGAs must improve if they are to be part of mainstream computing.
Article
Languages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden makes programming GPUs difficult and error-prone, rendering these powerful devices inaccessible to...
Article
Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use than conventional CPUs. The continued exponential increase in transistors, combined with the desire...
Conference Paper
Full-text available
Languages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden makes programming GPUs difficult and error-prone, rendering these powerful devices inaccessible to...
Conference Paper
We consider a setting in which a worker and a manager may each have information about the likely completion time of a task, and the worker also affects the completion time by choosing a level of effort. The task itself may further be composed of a set of subtasks, and the worker can also decide how many of these subtasks to split out into an explic...
Conference Paper
Heterogeneous systems show a lot of promise for extracting high-performance by combining the benefits of conventional architectures with specialized accelerators in the form of graphics processors (GPUs) and reconfigurable hardware (FPGAs). Extracting this performance often entails programming in disparate languages and models, making it hard for a...
Article
Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use than conventional CPUs. The continued exponential increase in transistors, combined with the desire...
Conference Paper
Since their invention over 40 years ago, virtual machines have been used to virtualize one or more von Neumann processors and their associated peripherals. System virtual machines provide the illusion that the user has their own instance of a physical machine with a given instruction set architecture (ISA). Process virtual machines provide the illu...
Article
Since their invention over 40 years ago, virtual machines have been used to virtualize one or more von Neumann processors and their associated peripherals. System virtual machines provide the illusion that the user has their own instance of a physical machine with a given instruction set architecture (ISA). Process virtual machines provide the illu...
Conference Paper
Lime is a new Java-compatible and object-oriented language designed to make programming of reconflgurable hardware significantly more accessible to skilled software developers. Lime programs may run either in software (via Java bytecodes) or in hardware (via behavioral and logic synthesis). This paper illustrates the salient synthesis-oriented feat...
Conference Paper
Lime is a new Java-compatible and object-oriented language designed to make programming of reconfigurable hardware significantly more accessible to skilled software developers. Lime programs may run either in software (via Java bytecodes) or in hardware (via behavioral and logic synthesis). This paper illustrates the salient synthesis-oriented feat...
Conference Paper
Full-text available
The halt in clock frequency scaling has forced architects and language designers to look elsewhere for continued improvements in performance. We believe that extracting maximum performance will require compilation to highly heterogeneous architectures that include reconfigurable hardware. We present a new language, Lime, which is designed to be exe...
Article
The halt in clock frequency scaling has forced architects and language designers to look elsewhere for continued improvements in performance. We believe that extracting maximum performance will require compilation to highly heterogeneous architectures that include reconfigurable hardware. We present a new language, Lime, which is designed to be exe...
Conference Paper
Software construction has typically drawn on engineering metaphors like building bridges or cathedrals, which emphasize architecture, specification, central planning, and determinism. Approaches to correctness have drawn on metaphors from mathematics, like formal proofs. However, these approaches have failed to scale to modern software systems, and...
Conference Paper
Full-text available
Generic classes can be used to improve performance by al- lowing compile-time polymorphism. But the applicability of compile-time polymorphism is narrower than that of run- time polymorphism, and it might bloat the object code. We advocate a programming principle whereby a generic class should be implemented in a way that minimizes the depen- denci...
Article
Generic classes can be used to improve performance by allowing compile-time polymorphism. But the applicability of compile-time polymorphism is narrower than that of runtime polymorphism, and it might bloat the object code. We advocate a programming principle whereby a generic class should be implemented in a way that minimizes the dependencies bet...
Article
Software correctness has bedeviled the field of computer science since its inception. Software complexity has increased far more quickly than our ability to control it, reaching sizes that are many orders of magnitude beyond the reach of formal or automated verification techniques. We propose a new paradigm for evaluating "correctness" based on a r...
Conference Paper
Stream processing represents an important class of applications that spans telecommunications, multimedia and the Internet. The implementation of streaming programs in FPGAs has attracted significant attention because of their inherent parallelism and high performance requirements. Languages, tools, and even custom hardware for streaming have been...
Conference Paper
Full-text available
Abstract The Flexotask system claims to enable implementation,of both real- time applications and real-time schedulers in a Java Virtual Ma- chine using an actors-like model. The PTIDES model is an actors- like model that claims to deliver precise control over end-to-end la- tencies in a complex,real-time system. The present work jointly in- vestig...
Article
The Flexotask system claims to enable implementation of both real-time applications and real-time schedulers in a Java Virtual Machine using an actors-like model. The PTIDES model is an actors-like model that claims to deliver precise control over end-to-end latencies in a complex real-time system. The present work jointly investigates both claims...
Article
Full-text available
Exotasks are a novel Java programming construct that achieve three important goals. They achieve low latency while allowing the fullest use of Java language features, compared to previous attempts to restrict the Java lan- guage for use in the sub-millisecond domain. They support pluggable schedulers, allowing easy implementation of new scheduling...
Conference Paper
Full-text available
Large real-time software systems such as real-time Java virtual ma- chines often use barrier protocols, which work for a dynamically varying number of threads without using centralized locking. Such barrier protocols, however, still suffer from priority inversion simi- lar to centralized locking. We introduce gang priority management as a generic s...
Article
Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , pa...
Article
Full-text available
Programs encounter increasingly complex and fragile mappings to computing platforms, resulting in performance characteristics that are often mysterious to students, practitioners, and even researchers. We discuss some steps toward an experimental methodology that demands and provides a deep understanding of complete systems, the necessary instrumen...
Conference Paper
Full-text available
The paradigm shift in processor design from monolithic processors to multicore has renewed interest in programming models that facilitate parallelism. While multicores are here today, the future is likely to witness architectures that use reconfigurable fabrics (FPGAs) as coprocessors. FPGAs provide an unmatched ability to tailor their circuitry pe...
Conference Paper
Full-text available
The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in realtime systems, have given rise to a variety of language extensions that place restrictions on how threads allocate, share, and communicate memory, leading to order-of-magnitude reductions in latency and jitter. However,...
Conference Paper
The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in realtime systems, have given rise to a variety of language extensions that place restrictions on how threads allocate, share, and communicate memory, leading to order-of-magnitude reductions in latency and jitter. However,...
Conference Paper
In this paper, we introduce Optimus: an optimizing synthesis com- piler for streaming applications. Optimus compiles programs writ- ten in a high level streaming language to either software or hard- ware implementations. The compiler uses a hierarchical compila- tion strategy that separates concerns between macro- and micro- functional requirements...
Conference Paper
Full-text available
Real-time Garbage Collection (RTGC) has recently advanced to the point where it is being used in production for financial trading, mil- itary command-and-control, and telecommunications. However, among potential users of RTGC, there is enormous diversity in both application requirements and deployment environments. Previously described RTGCs tend t...
Conference Paper
While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads. Since frequent allocation is often correlated with a high-level, object-oriented style of programming, this can force builde...
Conference Paper
While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads. Since frequent allocation is often correlated with a high-level, objectoriented style of programming, this can force builder...
Conference Paper
Existing programming methodologies for real-time systems suffer from a low level of abstraction and non-determinism in both the timing and the functional domains. As a result, real-time systems are difficult to test and must be re-certified every time changes are made to either the software or hardware environment. Exotasks are a novel Java program...
Conference Paper
Full-text available
Existing programming methodologies for real-time systems suffer from a low level of abstraction and non-determinism in both the timing and the functional domains. As a result, real-time systems are difficult to test and must be re-certified every time changes are made to either the software or hardware environment. Exotasks are a novel Java program...
Conference Paper
Full-text available
Developing embedded systems software poses unique challenges to Java application developers and virtual machine designers. Chief among these challenges is the memory footprint of both the virtual machine and the applications that run within it. With the rapidly increasing set of features provided by the Java language, virtual machine designers are...
Conference Paper
Concurrent garbage collectors are notoriously hard to design, implement, and verify. We present a framework for the automatic exploration of a space of concurrent mark-and-sweep collectors. In our framework, the designer specifies a set of "building blocks" from which algorithms can be constructed. These blocks reflect the designer's insights about...
Conference Paper
Full-text available
Concurrent garbage collectors are notoriously hard to design, implement, and verify. We present a framework for the automatic exploration of a space of concurrent mark-and-sweep collectors. In our framework, the designer specifies a set of "building blocks" from which algorithms can be constructed. These blocks reflect the designer's insights about...
Article
Traditional computer science deals with the computation of correct results. Realtime systems interact with the physical world, so they have a second correctness criterion: they have to compute the correct result within a bounded amount of time. Simply building functionally correct software is hard enough. When timing is added to the requirements, t...
Conference Paper
Full-text available
The emergence of standards for programming real-time systems in Java has encouraged many developers to consider its use for sys- tems previously only built using C, Ada, or assembly language. However, the RTSJ standard in isolation leaves many important problems unaddressed, and suffers from some serious problems in usability and safety. As a resul...
Conference Paper
Debugging the timing behavior of real-time systems is notoriously difficult, and with a new generation of complex systems consisting of tens of millions of lines of code, the difficulty is increasing enormously. We have developed TuningFork, a tool especially designed for visualization and analysis of large-scale real-time systems. TuningFork is ca...
Article
While real-time garbage collection has achieved worst-case latencies on the order of a millisecond, this technology is approaching its practical limits. For tasks requiring extremely low latency, and especially periodic tasks with frequencies above 1 KHz, Java programmers must currently resort to the NoHeapRealtimeThread construct of the Real-Time...
Conference Paper
Full-text available
Constructing correct concurrent garbage collection algorithms is notoriously hard. Numerous such algorithms have been proposed, implemented, and deployed - and yet the relationship among them in terms of speed and precision is poorly understood, and the validation of one algorithm does not carry over to others.As programs with low latency requireme...
Article
As processor speeds continue to increase at a much higher rate than memory speeds, memory latencies may soon approach a thousand processor cycles. As a result, the flat memory model that was made practical by deeply pipelined superscalar processors with multilevel caches will no longer be tenable. The most common approach to this problem is multith...
Conference Paper
Full-text available
TuningFork is an online, scriptable data visualization and analysis tool that supports the development and continuous monitor-ing of real-time systems. While TuningFork was originally designed and tested for use with a particular real-time Java Virtual Machine, the ar-chitecture has been designed from the ground up for extensibility by leveraging t...
Article
Full-text available
TuningFork is an online, scriptable data visualization and analysis tool that supports the development and continuous monitor- ing of real-time systems. While TuningFork was originally designed and tested for use with a particular real-time Java Virtual Machine, the ar- chitecture has been designed from the ground up for extensibility by leveraging...
Conference Paper
Full-text available
While real-time garbage collection has achieved worst-case laten- cies on the order of a millisecond, this technology is approaching its practical limits. For tasks requiring extremely low latency, and especially periodic tasks with frequencies above 1 KHz, Java pro- grammers must currently resort to the NoHeapRealtimeThread construct of the Real-T...
Conference Paper
Full-text available
There are many algorithms for concurrent garbage collection, but they are complex to describe, verify, and implement. This has resulted in a poor under- standing of the relationships between the algorithms, and has precluded system- atic study and comparative evaluation. We present a single high-level, abstract concurrent garbage collection algorit...
Conference Paper
Full-text available
Real-time garbage collection has been shown to be feasible, but for programs with high allocation rates, the utilization achievable is not sufficient for some systems.Since a high allocation rate is often correlated with a more high-level, abstract programming style, the ability to provide good real-time performance for such programs will help cont...
Conference Paper
Full-text available
A reference-counting garbage collector cannot reclaim unreachable cyclic structures of objects. Therefore, reference-counting collectors either use a backup tracing collector infrequently, or employ a cycle collector to reclaim cyclic structures. We propose a new concurrent cycle collector, i.e., one that runs concurrently with the program threads,...
Conference Paper
Real-time garbage collection has been shown to be feasible, but for programs with high allocation rates, the utilization achievable is not sufficient for some systems.Since a high allocation rate is often correlated with a more high-level, abstract programming style, the ability to provide good real-time performance for such programs will help cont...
Conference Paper
Full-text available
Real-time systems have reached a level of complexity beyond the scaling capability of the low-level or restricted languages traditionally used for real-time programming.While Metronome garbage collection has made it practical to use Java to implement real-time systems, many challenges remain for the construction of complex real-time systems, some s...
Conference Paper
Much prior work has shown that the performance enabled by garbage collection (GC) systems is highly dependent upon the behavior of the application as well as on the available resources. That is, no single GC enables the best performance for all programs and all heap sizes. To address this limitation, we present the design, implementation, and empir...
Conference Paper
Full-text available
Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. We have implemented high-performance collectors of both types, and in the process observed that the more we optimized them, the more similarly they behaved - that they seem to share...
Article
Automatic storage reclamation via reference counting has important advantages, but has always suffered from a major weakness due to its inability to reclaim cyclic data structures.
Article
Now that the use of garbage collection in languages like Java is becoming widely accepted due to the safety and software engineering benefits it provides, there is significant interest in applying garbage collection to hard real-time systems. Past approaches have generally suffered from one of two major flaws: either they were not provably real-tim...
Article
While uniprocessor garbage collection is relatively well understood, experience with collectors for large multiprocessor servers is limited and it is unknown which techniques best scale with large memories and large numbers of processors. In order to explore these issues we designed a modular garbage collection framework in the IBM Jalapeno Java vi...
Article
Now that the use of garbage collection in languages like Java is becoming widely accepted due to the safety and software engineering benefits it provides, there is significant interest in applying garbage collection to hard real-time systems. Past approaches have generally suffered from one of two major flaws: either they were not provably real-tim...
Article
With the wide-spread adoption of Java, there is significant interest in using the language for programming real-time systems. The community has generally viewed a truly real-time garbage collector as being impossible to build, and has instead focused its efforts on adding manual memory management mechanisms to Java. Unfortunately, these mechanisms...
Article
Full-text available
A reference counting garbage collector cannot reclaim unreachable cyclic structures of objects. Therefore, reference counting collectors either use a backup tracing collector seldom, or employ a cycle collectors to reclaim cyclic structures. Recently, the first on-the-fly cycle collector, that may run concurrently with program threads, was presente...
Article
In this paper, we describe a novel execution environment that can dynamically switch between garbage collection (GC) systems. As such, it enables application-specific GC selection. In addition, the system can switch between different GC systems while the program is executing. Our system is novel in that it is able to switch between a wide range of...
Conference Paper
Concurrent garbage collectors require write barriers to preserve consistency, but these barriers impose significant direct and indirect costs. While there has been a lot of work on optimizing write barriers, we present the first study of their elision in a concurrent collector. We show conditions under which write barriers are redundant, and descri...
Conference Paper
Full-text available
Security concerns on embedded devices like cellular phones make Java an extremely attractive technology for providing third-party and user-downloadable functionality. However, garbage collectors have typically required several times the maximum live data set size (which is the minimum possible heap size) in order to run well. In addition, the size...
Article
While Java provides many software engineering benefits, it lacks a coherent module system and instead provides only packages (which are primarily a name space mechanism) and classloaders (which are very low-level). As a result, large Java applications suffer from unexpected interactions between independent components, require complex CLASSPATH defi...
Article
While Java provides many software engineering benefits, it lacks a coherent module system and instead provides only packages (which are primarily a name space mechanism) and classloaders (which are very low-level). As a result, large Java applications suffer from unexpected in