Conference Paper

VMKit: a Substrate for Managed Runtime Environments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution, by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the benefits of MREs, and it is difficult to experiment with extensions of existing MRE based languages. This paper describes and evaluates VMKit, a first attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. We provide an extensive study of the lessons learned in developing this infrastructure, and assess the ease of implementing new MREs or MRE extensions and the resulting performance. In particular, it took one of the authors only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparableto the well established open source MREs Cacao, Apache Harmony and Mono, and is 1.2 to 3 times slower than JikesRVM on most of the Dacapo benchmarks.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this section, we expose details of the Java language [55] and the Java Virtual Machine (JVM) [76], in order to support the descriptions of our contributions in the Chapters 3 and 4. Even though we used a JVM implementation based on VMKit [51] to prototype our contributions, the descriptions included in this section also apply to common Java virtual machines. ...
... Based on Java byte code, the Java virtual machine offers binary portability across different architectures, and ensures safe code execution based on static and dynamic (i.e., runtime) verification. Many Java virtual machine implementations were developed, based on the public JVM specifications [76], including Hotspot JVM, Jikes Research Virtual Machine [1], Dalvik Virtual Machine [93,25], VMKit [51]. A JVM holds many subsystems that interact heavily in order to execute Java byte code. ...
... We prototyped Incinerator in J3: an experimental Java Virtual Machine based on VMKit [51] and the Low-Level Virtual Machine (LLVM) [74] and the Memory Management Toolkit (MMTk) [20]. We tested Incinerator on the Mark&Sweep garbage collector of MMTk, and the Knopflerfish [81] 3. 5.0 framework, one of the main OSGi implementations. ...
Article
Our homes become smart thanks to devices providing services (security, energy efficiency,…). Untrusted service providers want to take advantage of the smart home by developing services hosted by an embedded smart home gateway. The gateway should be robust enough to handle software problems. Sharing resources of the gateway between service providers allows providing richer services but raises risks of resource sharing conflicts. We addresses the problem of resource sharing conflicts in the smart home gateway, by prevention when possible, and by detection and resolution otherwise. Our first contribution "Jasmin" is a middleware to develop, deploy and isolate native embedded component-based and service-oriented applications. Jasmin uses Linux containers for lightweight isolation. Our second contribution "Incinerator" is a subsystem in the Java Virtual Machine (JVM) aiming to resolve the problem of Java stale references, which cause significant memory leaks in an OSGi-based smart home gateway, hence increasing the risks of memory sharing conflicts. Incinerator detects and eliminates stale references. In order to detect memory sharing conflicts, we propose our third contribution: memory monitoring subsystem in the JVM. The system accurately accounts for resources consumed during cross-application interactions, and provides snapshots of memory usage statistics for the different service providers sharing the gateway.
... We have prototyped Incinerator in J3, a Java virtual machine based on VMKit [3] . Our implementation of Incinerator modifies the MMTk " Mark-Sweep " garbage collector [4] included in VMKit and adds 150 lines of code to J3. ...
... The Incinerator prototype is based on the J3/VMKit [3] experimental JVM. Implementing Incinerator requires roughly 150 lines of C++ code. ...
... In the former case, the swap space is necessary, because J3 requires at least 1 gigabyte of address space to run the DaCapo benchmark suite. J3 has been measured to be between 1.2 and 3 times slower than JikesRVM [3]. We use J3 in our evaluation because it is easy to extend with new functionalities. ...
Article
Full-text available
Java class loaders are commonly used in application servers to load, unload and update a set of classes as a unit. However, unloading or updating a class loader can introduce stale references to the objects of the outdated class loader. A stale reference leads to a memory leak and, for an update, to an inconsistency between the outdated classes and their replacements. To detect and eliminate stale references, we propose Incinerator, a Java virtual machine extension that introduces the notion of an outdated class loader. Incinerator detects stale references and sets them to null during a garbage collection cycle. We evaluate Incinerator in the context of the OSGi framework and show that Incinerator correctly detects and eliminates stale references, including a bug in Knopflerfish. We also evaluate the performance of Incinerator with the DaCapo benchmark on VMKit and show that Incinerator has an overhead of at most 3.3%.
... A clear candidate for the compiler portion of this work is LLVM [28]. It is a widely supported compiler framework, and already provides a toolkit for building managed runtime systems based on it [17]. LLVM is also being adopted in the context of a commercial JVM, underpinning Azul's Falcon compiler [22]. ...
... It is therefore important to build on existing technology as much as possible. We believe that a combination of MMTk [7] and LLVM [28] could be a good foundation (it has been shown that a well-performing runtime system can be built around these two components within the scope of an academic project [17]). ...
Conference Paper
The public cloud is moving to a Platform-as-a-Service model where services such as data management, machine learning or image classification are provided by the cloud operator while applications are written in high-level languages and leverage these services. Managed languages such as Java, Python or Scala are widely used in this setting. However, while these languages can increase productivity, they are often associated with problems such as unpredictable garbage collection pauses or warm-up overheads. We argue that the reason for these problems is that current language runtime systems were not initially designed for the cloud setting. To address this, we propose seven tenets for designing future language runtime systems for cloud data centers. We then outline the design of a general substrate for building such runtime systems, based on these seven tenets.
... At runtime, the monitoring system applies those rules to correctly account for memory used by components, e.g., local variables, loaded classes, and created objects. In most of the cases, component developers do not need to write accounting rules because implicit rules han- We implemented the monitoring system inside J3, a Java Virtual Machine based on VMKit [5], LLVM [9] and MMTk [1]. Even though we slightly change the native object structure, the changes are invisible to the Java code, which helps preserve the component model of OSGi. ...
... The monitoring subsystem implementation is around 2000 lines of C++ code mostly inside the J3 JVM based on VMKit [5], LLVM [9] and MMTk [1]. The OSGi framework used is Knopflerfish 5.0. ...
Article
Full-text available
Smart Home market players aim to deploy component-based and service-oriented applications from untrusted third party providers on a single OSGi execution environment. This creates the risk of resource abuse by buggy and malicious applications, which raises the need for resource monitoring mechanisms. Existing resource monitoring solutions either are too intrusive or fail to identify the relevant resource consumer in numerous multi-tenant situations. This paper proposes a system to monitor the memory consumed by each tenant, while allowing them to continue communicating directly to render services. We propose a solution based on a list of configurable resource accounting rules between tenants, which is far less intrusive than existing OSGi monitoring systems. We modified an experimental Java Virtual Machine in order to provide the memory monitoring features for the multi-tenant OSGi environment. Our evaluation of the memory monitoring mechanism on the DaCapo benchmarks shows an overhead below 46%.
... 1 If a thread is blocked while waiting for a synchronization on the stale reference, Incinerator also unblocks the thread, in order to prevent leaking of the thread and its reachable objects. We have prototyped Incinerator in J3, a Java virtual machine based on VMKit [10]. Incinerator modifies the MMTk ''Mark-Sweep'' garbage collector [3] included in VMKit. ...
... The Incinerator prototype is based on the J3/VMKit [10] experimental JVM. Implementing Incinerator requires 650 lines of C++, and modifying approximately 20 lines in the JVM (the scan and the termination functions of the garbage collector, and the lock acquire function of the monitors). ...
Article
Full-text available
In the context of smart homes, the OSGi middleware is emerging as a standard to execute applications that collaborate together to render services. However, an application update in OSGi can introduce stale references, i.e., references to an outdated version of the application. A stale reference leads to a memory leak and to an inconsistency between the outdated version of the application and the new one. To avoid stale references, we propose Incinerator, a Java virtual machine extension that not only detects, but also eliminates stale references at runtime. Incinerator mainly runs when the garbage collector scans the object graph, so as to find stale references and set them to null. We have used Incinerator to detect a stale reference in the Knopflerfish OSGi framework implementation. Incinerator has a low overhead of at most 3.3% on average on the applications of the DaCapo benchmark suite. This shows that Incinerator is reasonable for use in production environments.
... VMKit provides basic components required to create a VM such as JIT (Just In Time) compiler, Garbage Collector (GC) and a Thread Manager. J3 and N3 are VMs that have been developed using VMKit which prove the significant reduction in development time of a completely new VM [2]. Furthermore, the core of VMKit depends on LLVM compiler infrastructure which provides the required key components such as JIT compiler and GC. ...
Research Proposal
Full-text available
This article is presented to propose an improvement for an existing problem regarding the performance of LLVM JIT (Just in Time) compiler, which is an inherent slowness during the start-up of virtual machines (VM), Ex: VMKit J3 [2]. Our proposed solution is to use a JIT compiler as an adaptive optimization for the current VMKit implementation. What's left is a system that can keep track of and dynamically look-up the hotness of methods and re-compile with more expensive optimizations as the methods are executed over and over. This should improve program start-up time and execution time and will bring great benefits to all ported languages that intend to use LLVM JIT as one of the execution methods. Moreover our implementation will have a mix mode execution including a non-optimized and dynamic optimization with LLVM JIT. In the end, we will benchmark our implementation so that we can compare the difference in performance of the current LLVM JIT and the improved LLVM JIT with VMKit framework. 1. Overview Implementing a virtual machine (VM) is a painful task which demands huge effort and knowledge regarding the hosting infrastructure and architecture. When moving from one architecture to the other it is again a time consuming task which demands reimplementation of the main modules of VM. To relax this inherent overhead of building a VM, VMKit which work as a substrate has been developed so that it provides about 95 percent of the code required in developing a new VM. VMKit provides basic components required to create a VM such as JIT (Just In Time) compiler, Garbage Collector (GC) and a Thread Manager. J3 and N3 are VMs that have been developed using VMKit which prove the significant reduction in development time of a completely new VM [2]. Furthermore, the core of VMKit depends on LLVM compiler infrastructure which provides the required key components such as JIT compiler and GC. In this article our point of interest is the LLVM JIT compiler because it is a known fact that the startup time consumption during the VM execution, developed using this core module is considerably higher. Therefore we propose an improvement for the behavior of LLVM JIT compiler, in order to make the startup and the run time to be faster than the current implementation. 2. Virtualization technology studied • Name: VMKit • URL: http://vmkit.llvm.org/ • VM-type: High Level Language VM • Common usage: VMKit eases the development of new managed runtime environments (MREs) and the process of experimenting with new mechanisms inside MREs. J3(JVM) and N3(CLI) are built on top of VMKit. • Motivation: While VMKit aims to provide a fully functional intermediate layer for VM, its still under heavy development and misses some vital components with some open projects proposed. One of the current drawbacks of the LLVM JIT is the lack of an adaptive compilation system which will be our contribution to this project. What is missing is a system that can keep track of and dynamically look-up the hotness of methods and re-compile with more expensive optimizations as the methods are executed over and over. This should improve program startup time and execution time and will bring great benefits to overall performance.
... VMADL thus combines aspect-and feature-oriented ap- proaches [9] in a more architecture-aware manner. VMKit [17] is called a " substrate " for implementing VMs. It provides a common foundation that implementations of different instruction sets and programming languages can build upon. ...
Article
Full-text available
CSOM/PL is a software product line (SPL) derived from applying multi-dimensional separation of concerns (MDSOC) techniques to the domain of high-level language virtual machine (VM) implementations. For CSOM/PL, we modularised CSOM, a Smalltalk VM implemented in C, using VMADL (virtual machine architecture description language). Several features of the original CSOM were encapsulated in VMADL modules and composed in various combinations. In an evaluation of our approach, we show that applying MDSOC and SPL principles to a domain as complex as that of VMs is not only feasible but beneficial, as it improves understandability, maintainability, and configurability of VM implementations without harming performance.
... To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. LLVM seems quite popular nowadays and is used as a common infrastructure to implement a broad variety of statically and runtime compiled languages, e.g., the family of languages supported by GCC, Java and .NET [4], Python [18,24], Ruby [13,16], Haskell [22], as well as countless lesser known languages. It has replaced a broad variety of special-purpose compilers and has also been used to create a broad variety of new products (e.g., the OpenCL GPU programming language and runtime). ...
Article
Full-text available
This paper describes ErLLVM, a new backend for the HiPE compiler, the native code compiler of Erlang/OTP, that targets the LLVM compiler infrastructure. Besides presenting the overall architecture of ErLLVM and its integration in Erlang/OTP, we describe the changes to LLVM that ErLLVM required and discuss technical challenges and decisions we took. Finally, we provide a detailed performance evaluation of ErLLVM compared to BEAM, the existing backends of the HiPE compiler, and Erjang.
... Another approach is to add support for dynamic languages to an existing high-performance static-language VM [14, 31]. A number of projects have attempted to use LLVM [38] as a compiler for high-level managed languages, such as Rubinius and MacRuby for Ruby [39, 52] , Unladen Swallow for Python [66], Shark and VMKit for Java [5, 23], and McVM for MATLAB [24]. These implementations have to provide a translator from the guest languages' high-level semantics to the low-level semantics of LLVM IR. ...
Conference Paper
Full-text available
Building high-performance virtual machines is a complex and expensive undertaking; many popular languages still have low-performance implementations. We describe a new approach to virtual machine (VM) construction that amortizes much of the effort in initial construction by allowing new languages to be implemented with modest additional effort. The approach relies on abstract syntax tree (AST) interpretation where a node can rewrite itself to a more specialized or more general node, together with an optimizing compiler that exploits the structure of the interpreter. The compiler uses speculative assumptions and deoptimization in order to produce efficient machine code. Our initial experience suggests that high performance is attainable while preserving a modular and layered architecture, and that new high-performance language implementations can be obtained by writing little more than a stylized interpreter.
... Conversely, languages willing to support different semantics in .NET need to choose between emulating them on top of what the .NET platform offers or creating a dialect of the guest language. An approach that is an alternative to extending existing runtimes for supporting multiple languages is represented by the VMKit project [7]. VMKit provides developers with reusable core VM components that can be glued together in order to obtain a managed language runtime by means of component reuse and composition. ...
Conference Paper
Full-text available
Truffle is a Java-based framework for developing high-performance language runtimes. Language implementers aiming at developing new runtimes have to design all the runtime mechanisms for managing dynamically typed objects from scratch. This not only leads to potential code duplication, but also impacts the actual time needed to develop a fully-fledged runtime. In this paper we address this issue by introducing a common object storage model (OSM) for Truffle that can be used by language implementers to develop new runtimes. The OSM is generic, language-agnostic, and portable, as it can be used to implement a great variety of dynamic languages. It is extensible, featuring built-in support for custom extension mechanisms. It is also high-performance, as it is designed to benefit from the optimizing compiler in the Truffle framework. Our initial evaluation indicates that the Truffle OSM can be used to implement high-performance language runtimes, with no performance overhead when compared to language-specific solutions.
... Some of these approaches are metacircular, e.g., implementing a Java Virtual Machine in Java, and thus benefit from the language's features. In addition to the use of highlevel languages, aspects such as modularity, observability and extensibility have been in the focus of the VM research community as well [13,35]. However, even the metacircular approaches produce VMs that do not enable significant observability and interactivity with the VM at run-time. ...
Conference Paper
Full-text available
Modern development environments promote live programming (LP) mechanisms because it enhances the development experience by providing instantaneous feedback and interaction with live objects. LP is typically supported with advanced reflective techniques within dynamic languages. These languages run on top of Virtual Machines (VMs) that are built in a static manner so that most of their components are bound at compile time. As a consequence, VM developers are forced to work using the traditional edit-compile-run cycle, even when they are designing LP-supporting environments. In this paper we explore the idea of bringing LP techniques to the VM domain for improving their observability, evolution and adaptability at run-time. We define the notion of fully reflective execution environments (EEs), systems that provide reflection not only at the application level but also at the level of the VM. We characterize such systems, propose a design, and present Mate v1, a prototypical implementation. Based on our prototype, we analyze the feasibility and applicability of incorporating reflective capabilities into different parts of EEs. Furthermore, the evaluation demonstrates the opportunities such reflective capabilities provide for unanticipated dynamic adaptation scenarios, benefiting thus, a wider range of users.
... In [150] the authors introduce I-JVM, a modied JVM that implements their concept of isolates. I-JVM is based on VMKit [151], a software framework to speed up the creation of VPs. ...
Article
Infrastructure as a service (IaaS) Cloud platforms are increasingly used in the IT industry. IaaS platforms are providers of virtual resources from a catalogue of predefined types. Improvements in virtualization technology make it possible to create and destroy virtual machines on the fly, with a low overhead. As a result, the great benefit of IaaS platforms is the ability to scale a virtual platform on the fly, while only paying for the used resources. From a research point of view, IaaS platforms raise new questions in terms of making efficient virtual platform scaling decisions and then efficiently scheduling applications on dynamic platforms. The current thesis is a step forward towards exploring and answering these questions. The first contribution of the current work is focused on resource management. We have worked on the topic of automatically scaling cloud client applications to meet changing platform usage. There have been various studies showing self-similarities in web platform traffic which implies the existence of usage patterns that may or may not be periodical. We have developed an automatic platform scaling strategy that predicted platform usage by identifying non-periodic usage patterns and extrapolating future platform usage based on them. Next we have focused on extending an existing grid platform with on-demand resources from an IaaS platform. We have developed an extension to the DIET (Distributed Interactive Engineering Toolkit) middleware, that uses a virtual market based approach to perform resource allocation. Each user is given a sum of virtual currency that he will use for running his tasks. This mechanism help in ensuring fair platform sharing between users. The third and final contribution targets application management for IaaS platforms. We have studied and developed an allocation strategy for budget-constrained workflow applications that target IaaS Cloud platforms. The workflow abstraction is very common amongst scientific applications. It is easy to find examples in any field from bioinformatics to geology. In this work we have considered a general model of workflow applications that comprise parallel tasks and permit non-deterministic transitions. We have elaborated two budget-constrained allocation strategies for this type of workflow. The problem is a bi-criteria optimization problem as we are optimizing both budget and workflow makespan. This work has been practically validated by implementing it on top of the Nimbus open source cloud platform and the DIET MADAG workflow engine. This is being tested with a cosmological simulation workflow application called RAMSES. This is a parallel MPI application that, as part of this work, has been ported for execution on dynamic virtual platforms. Both theoretical simulations and practical experiments have shown encouraging results and improvements.
... In this section, we first describe the implementation of synchronization in modern JVMs, focusing on Hotspot 7. The same implementation strategy is used in other modern JVMs, such as Jikes RVM [1] and VMKit [13] . Free Lunch leverages this implementation to perform profiling efficiently. ...
Article
Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.
... Executable papers may well be one important motivation to move in this direction. New compilation tools such as LLVM [13] and VMKit [14] are an important technological innovation for such a move. They blur the frontier between native machine code and virtual machine bytecode, allow a better integration between the two worlds, and facilitate the design and implementation of virtual machines. ...
Article
Full-text available
This proposal describes how data, program code, and presentation can be stored together in a single file suitable for electronic publication and permitting the reproduction of computational results.. Universality, effciency, platformindependence, automated verifiability, and provenance tracking are the major design criteria. Existing and well-tested technology is used as much as possible, the two major building blocks being the Hierarchical Data Format for storage and the Java Virtual Machine for platform-independent code representation and secure execution.
... The Mu micro virtual machine [20,21] is inspired by the formal verification of the seL4 microkernel [13]. A micro vir- tual machine is a minimal, language-agnostic substrate that focuses only on the three major concerns that contribute to the difficulties of language implementation, namely dynamic 'just-in-time' (JIT) compilation, concurrency, and automatic memory management ('garbage collection' (GC)) [8,21]. This design aims to provide a reliable low-level virtual machine to facilitate language implementation. ...
Article
On-stack replacement (OSR) is a performance-critical technology for many languages, especially dynamic languages. Conventional wisdom, apparent in JavaScript engines such as V8 and SpiderMonkey, is that OSR must be implemented in a low-level (i.e., in assembly) and language-specific way. This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware. Using an experimental JavaScript implementation, we demonstrate that this API enables the language implementation to perform OSR without the need to deal with machine-level details. We also show that the API itself is implementable on concrete hardware. This work helps crystallize OSR abstractions and, by providing a reusable implementation, brings OSR within reach for more language implementers.
... The Mu micro virtual machine [20,21] is inspired by the formal verification of the seL4 microkernel [13]. A micro vir- tual machine is a minimal, language-agnostic substrate that focuses only on the three major concerns that contribute to the difficulties of language implementation, namely dynamic 'just-in-time' (JIT) compilation, concurrency, and automatic memory management ('garbage collection' (GC)) [8,21]. This design aims to provide a reliable low-level virtual machine to facilitate language implementation. ...
Conference Paper
On-stack replacement (OSR) is a performance-critical technology for many languages, especially dynamic languages. Conventional wisdom, apparent in JavaScript engines such as V8 and SpiderMonkey, is that OSR must be implemented in a low-level (i.e., in assembly) and language-specific way. This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware. Using an experimental JavaScript implementation, we demonstrate that this API enables the language implementation to perform OSR without the need to deal with machine-level details. We also show that the API itself is implementable on concrete hardware. This work helps crystallize OSR abstractions and, by providing a reusable implementation, brings OSR within reach for more language implementers.
... A number of projects have attempted to use LLVM [29] as a compiler for high-level managed languages, such as Rubinius and MacRuby for Ruby [30,41], Unladen Swallow for Python [55], Shark and VMKit for Java [5,16], and McVM for MATLAB [11]. These implementations have to provide a translator from the guest languages' high-level semantics to the low-level semantics of LLVM IR. ...
Conference Paper
Full-text available
Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system. This violates the principle to not repeat yourself. In contrast, we define languages solely by writing an interpreter. The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information. Compiled code is derived automatically using partial evaluation while incorporating these specializations. This makes partial evaluation practical in the context of dynamic languages: It reduces the size of the compiled code while still compiling all parts of an operation that are relevant for a particular program. When a speculation fails, execution transfers back to the interpreter, the program re-specializes in the interpreter, and later partial evaluation again transforms the new state of the interpreter to compiled code. We evaluate our approach by comparing our implementations of JavaScript, Ruby, and R with best-in-class specialized production implementations. Our general-purpose compilation system is competitive with production systems even when they have been heavily optimized for the one language they support. For our set of benchmarks, our speedup relative to the V8 JavaScript VM is 0.83x, relative to JRuby is 3.8x, and relative to GNU R is 5x.
Article
This paper surveys the risks brought by multitenancy in software platforms, along with the most prominent solutions proposed to address them. A multitenant platform hosts and executes software from several users (tenants). The platform must ensure that no malicious or faulty code from any tenant can interfere with the normal execution of other users’ code or with the platform itself. This security requirement is specially relevant in Platform-as-a-Service (PaaS) clouds. PaaS clouds offer an execution environment based on some software platform. Unless PaaS systems are deemed as safe environments users will be reluctant to trust them to run any relevant application. This requires to take into account how multitenancy is handled by the software platform used as the basis of the PaaS offer. This survey focuses on two technologies that are or will be the platform-of-choice in many PaaS clouds: Java and .NET. We describe the security mechanisms they provide, study their limitations as multitenant platforms and analyze the research works that try to solve those limitations. We include in this analysis some standard container technologies (such as Enterprise Java Beans) that can be used to standardize the hosting environment of PaaS clouds. Also we include a brief discussion of Operating Systems (OSs) traditional security capacities and why OSs are unlikely to be chosen as the basis of PaaS offers. Finally, we describe some research initiatives that reinforce security by monitoring the execution of untrusted code, whose results can be of interest in multitenant systems.
Conference Paper
Cloud Computing infrastructures and Grid Computing platforms are representatives of a new breed of systems that leverage the modularity paradigm to assemble large-scale dynamic applications from modules contributed by different, possibly untrustworthy providers. Increased susceptibility to faults, diminished accountability, and complex system configuration are major challenges when assembling and operating such systems. In this paper, we describe how to solve these problems by retrofitting module management systems with the ability to deploy modules to execution environments with adjustable degree of isolation. We give a formal definition of the underlying hierarchical Module Isolation Problem and devise an online algorithm to solve it in an incremental fashion. We discuss how to apply our approach to a state-of-the-art module management system and demonstrate its effectiveness by an experimental evaluation.
Conference Paper
In this paper we describe the design and implementation of a compilation and code analysis toolchain for embedded systems software targeting the RISCO processor, using the LLVM project. Small systems embedded in a larger device are by far the most common kind of computational system in use today, deployed in various types of equipments. Because of their nature, an embedded system presents interesting size, efficiency and energy consumption restrictions, among others, that impose unique challenges on a project. In that scenario, the RISCO processor, a RISC architecture similar to MIPS, was created as a simple, efficient, processor that could prove to be a practical alternative to the available commercial options in its price range. The toolchain we developed permit the development, simulation and analysis of software in C and C++ for the RISCO platform, with open source tools. Besides compiling and executing high level code, the environment supports emitting control flow graphs for each module, enabling further analysis. As a case study on using CFGs and generated machine code information we developed a worst case execution time analysis tool for RISCO code. We discuss the scope of the tools, the design decisions involved in the development of the compilation and analysis system, and the results obtained through testing.
Article
The growing gap between the advanced capabilities of static compilers as reflected in benchmarking results and the actual performance that users experience in real-life scenarios makes client-side dynamic optimization technologies imperative to the domain of static languages. Dynamic optimization of software distributed in the form of a platform-agnostic Intermediate- Representation (IR) has been very successful in the domain of managed languages, greatly improving upon interpreted code, especially when online profiling is used. However, can such feedback-directed IR-based dynamic code generation be viable in the domain of statically compiled, rather than interpreted, languages? We show that fat binaries, which combine the IR together with the statically compiled executable, can provide a practical solution for software vendors, allowing their software to be dynamically optimized without the limitation of binary-level approaches, which lack the high-level IR of the program, and without the warm-up costs associated with the IR-only software distribution approach. We describe and evaluate the fat-binary-based runtime compilation approach using SPECint2006, demonstrating that the overheads it incurs are low enough to be successfully surmounted by dynamic optimization. Building on Java JIT technologies, our results already improve upon common real-world usage scenarios, including very small workloads.
Article
Since just-in-time (JIT) has considerable overhead to detect hot spots and compile them at runtime, using sophisticated optimization techniques for embedded devices means that any resulting performance improvements will be limited. In this paper, we introduce a novel static Dalvik bytecode optimization framework, as a complementary compilation of the Dalvik virtual machine, to improve the performance of Android applications. Our system generates optimized Dalvik bytecodes by using Low Level Virtual Machine (LLVM). A major obstacle in using LLVM for optimizing Dalvik bytecodes is determining how to handle the high-level language features of the Dalvik bytecode in LLVM IR and how to optimize LLVM IR conforming to the language information of the Dalvik bytecode. To this end, we annotate the high-level language features of Dalvik bytecode to LLVM IR and successfully optimize Dalvik bytecodes through instruction selection processes. Our experimental results show that our system with JIT improves the performance of Android applications by up to 6.08 times, and surpasses JIT by up to 4.34 times.
Article
Interoperability between languages has been a problem since the second programming language was invented. Solutions have ranged from language-independent object models such as COM (Component Object Model) and CORBA (Common Object Request Broker Architecture) to VMs (virtual machines) designed to integrate languages, such as JVM (Java Virtual Machine) and CLR (Common Language Runtime). With software becoming ever more complex and hardware less homogeneous, the likelihood of a single language being the correct tool for an entire program is lower than ever. As modern compilers become more modular, there is potential for a new generation of interesting solutions.
Thesis
Aujourd’hui, les systèmes logiciels sont omniprésents. Parfois, les applications doiventfonctionner sur des dispositifs à ressources limitées. Toutefois, les applications néces-sitent un support d’exécution de faire face à de telles limitations. Cette thèse abordele problème de la programmation pour créer des systèmes «conscient des ressources»supporté par des environnements d’exécution adaptés (MRTEs). En particulier, cettethèse vise à offrir un soutien efficace pour recueillir des données sur la consommationde ressources de calcul (par exemple, CPU, mémoire), ainsi que des mécanismes effi-caces pour réserver des ressources pour des applications spécifiques. Dans les solutionsexistantes, nous trouvons deux inconvénients importants. Les solutions imposent un im-pact important sur les performances à l’exécution des applications. La création d’outilspermettant de gérer finement les ressources pour ces abstractions est encore une tâchecomplexe. Les résultats de cette thèse forment trois contributions:• Un cadre de surveillance des ressources optimiste qui réduit le coût de la collectedes données de consommation de ressources.• Une méthodologie pour sélectionner les le support d’exécution des composants aumoment du déploiement afin d’effectuer la réservation de ressources.• Un langage pour construire des profileurs de mémoire personnalisées qui peuventêtre utilisés à la fois au cours du développement des applications, ainsi que dansun environnement de production.
Article
Interoperability between languages has been a problem from the time of the invention of the second programming language. Solutions have ranged from language-independent object models such as Component Object Model (COM) and Common Object Request Broker Architecture (CORBA) to virtual machines (VMs) designed to integrate languages such as Java Virtual Machine (JVM) and Common Language Runtime (CLR). There is potential for a new generation of interesting solutions, as modern compilers become more modular. A number of other mechanisms for exceptions have been proposed to address such problems and challenges. Efficient interoperability is important for heterogeneous multicore systems. Experts suggest that programmers need to start exploring better built-in support for common operations in other languages to address these challenges.
Article
A lot of work is spent on low-level optimization for regular computations; from instruction scheduling and cache-aware design to intensive use of SIMD instructions. Meanwhile, irregular applications, especially pointer intensive ones, are often only optimized at algorithm or compilation levels, since not so much hardware or dedicated instructions are available for this kind of code. In this paper, we investigate a low-level optimization of associative arrays intensively used in complex applications such as dynamic compilers, using self-modifying code. We propose to encode Red-Black trees, widely used to implement asssociative arrays, as specialized binary code rather than data, in order to accelerate the tree traversal by taking advantage of the underlying hardware: program cache, processor fetch and decode. We show a 45% gain on an ARM Cortex-A9 processor and that we transfer most of the data-cache pressure to the program-cache, motivating future work on dedicated hardware.
Conference Paper
The use of computer-based systems has increased significantly over the last years in several domains, mainly when we take into account the applications running on mobile platforms that have exploded in just a few short years, so that software verification and testing now play an important role in ensuring the overall product quality. In this paper, we describe the preliminary results of a work that presents a method to integrate formal verification techniques adopting ESC/Java2 and JCute tools with unit testing by TestNG framework to verify Java programs. This method aims to extract the safety properties generated by ESC/Java2 to automatically generate test cases using the rich set of assertions provided by the TestNG framework and JCute to validate those test cases. It is worth noting that is widely recognized that there is a growing need for automated testing techniques aimed at mobile applications, in platforms, such as: Android or Java Platform, Micro Edition (Java ME). Additionally, a critical challenge is the systematic generation of test cases. We show preliminary results of our proposed method over publicly available benchmarks, and compare the results to recognized tools, e.g., CBMC and JavaPathFinder. Experimental results show that our proposed method detects 86.04% of correct results (i.e., if a property satisfy its specification or is violated), while CBMC has found 79.06%, and JPF has found 93,02%.
Article
The growing gap between the advanced capabilities of static compilers as reflected in benchmarking results and the actual performance that users experience in real-life scenarios makes client-side dynamic optimization technologies imperative to the domain of static languages. Dynamic optimization of software distributed in the form of a platform-agnostic Intermediate-Representation (IR) has been very successful in the domain of managed languages, greatly improving upon interpreted code, especially when online profiling is used. However, can such feedback-directed IR-based dynamic code generation be viable in the domain of statically compiled, rather than interpreted, languages? We show that fat binaries, which combine the IR together with the statically compiled executable, can provide a practical solution for software vendors, allowing their software to be dynamically optimized without the limitation of binary-level approaches, which lack the high-level IR of the program, and without the warm-up costs associated with the IR-only software distribution approach. We describe and evaluate the fat-binary-based runtime compilation approach using SPECint2006, demonstrating that the overheads it incurs are low enough to be successfully surmounted by dynamic optimization. Building on Java JIT technologies, our results already improve upon common real-world usage scenarios, including very small workloads.
Conference Paper
Application requirements evolve over time and the underlying protocols need to adapt. Most transport protocols evolve by negotiating protocol extensions during the handshake. Experience with TCP shows that this leads to delays of several years or more to widely deploy standardized extensions. In this paper, we revisit the extensibility paradigm of transport protocols. We base our work on QUIC, a new transport protocol that encrypts most of the header and all the payload of packets, which makes it almost immune to middlebox interference. We propose Pluginized QUIC (PQUIC), a framework that enables QUIC clients and servers to dynamically exchange protocol plugins that extend the protocol on a per-connection basis. These plugins can be transparently reviewed by external verifiers and hosts can refuse non-certified plugins. Furthermore, the protocol plugins run inside an environment that monitors their execution and stops malicious plugins. We demonstrate the modularity of our proposal by implementing and evaluating very different plugins ranging from connection monitoring to multipath or Forward Erasure Correction. Our results show that plugins achieve expected behavior with acceptable overhead. We also show that these plugins can be combined to add their functionalities to a PQUIC connection.
Conference Paper
Distributed file storage services (DFSS) such as Dropbox, iCloud, SkyDrive, or Google Drive, offer a filesystem interface to a distributed data store. DFSS usually differ in the consistency level they provide for concurrent accesses: a client might access a cached version of a file, see the immediate results of all prior operations, or temporarily observe an inconsistent state. The selection of a consistency level has a strong impact on performance. It is the result of an inherent tradeoff between three properties: consistency, availability, and partition-tolerance. Isolating and identifying the exact impact on performance is a difficult task, because DFSS are complex designs with multiple components and dependencies. Furthermore, each system has a different range of features, its own design and implementation, and various optimizations that do not allow for a fair comparison. In this paper, we make a step towards a principled comparison of DFSS components, focusing on the evaluation of consistency mechanisms. We propose a novel modular DFSS testbed named FlexiFS, which implements a range of state-of-the-art techniques for the distribution, replication, routing, and indexing of data. Using FlexiFS, we survey six consistency levels: linearizability, sequential consistency, and eventual consistency, each operating with and without close-to-open semantics. Our evaluation shows that: (i) as expected, POSIX semantics (i.e., linearizability without close-to-open semantics) harm performance; and (ii) when close-to-open semantics is in use, linearizability delivers performance similar to sequential or eventual consistency.
Article
This paper focuses on the potential security risk embedded in the model of multi-tenant sharing virtual machine service. Also the characteristics of current Java virtual machine standard model and the typical problems that may be encountered under the multi-tenant circumstances will be further discussed. Additionally, major solutions to the above problems of Java application server, including MVM, I-JVM and improved OSGI model, will be deeply analyzed in this paper. Through introduction to the GAE of commercial multi-tenant PaaS platform, this paper presents a clear and precise discussion and clarification to the solutions of security problems on the platform of multi-tenant PaaS.
Conference Paper
Program portability is an important software engineering consideration. However, when high-level languages are extended to effectively implement system projects for software engineering gain and safety, portability is compromised--high-level code for low-level programming cannot execute on a stock runtime, and, conversely, a runtime with special support implemented will not be portable across different platforms. We explore the portability pitfall of high-level low-level programming in the context of virtual machine implementation tasks. Our approach is designing a restricted high-level language called RJava, with a flexible restriction model and effective low-level extensions, which is suitable for different scopes of virtual machine implementation, and also suitable for a low-level language bypass for improved portability. Apart from designing such a language, another major outcome from this work is clearing up and sharpening the philosophy around language restriction in virtual machine design. In combination, our approach to solving portability pitfalls with RJava favors virtual machine design and implementation in terms of portability and robustness.
Article
Full-text available
Generational garbage collectors are able to achieve very small pause times by concentrating on the youngest (most recently allocated) objects when collecting, since objects have been observed to die young in many systems. Generational collectors must keep track of all pointers from older to younger generations, by “monitoring” all stores into the heap. This write barrier has been implemented in a number of ways, varying essentially in the granularity of the information observed and stored. Here we examine a range of write barrier implementations and evaluate their relative performance within a generation scavenging garbage collector for Smalltalk.
Article
Full-text available
Implementing new operating systems is tedious, costly, and often impractical except for large projects. The Flux OSKit addresses this problem in a novel way by providing clean, well-documented OS components designed to be reused in a wide variety of other environments, rather than denning a new OS structure. The OSKit uses unconventional techniques to maximize its usefulness, such as intentionally exposing implementation details and platform-specific facilities. Further, the OSKit demonstrates a technique that allows unmodified code from existing mature operating systems to be incorporated quickly and updated regularly, by wrapping it with a small amount of carefully designed "glue" code to isolate its dependencies and export well-defined interfaces. The OSKit uses this technique to incorporate over 230,000 lines of stable code including device drivers, file systems, and network protocols. Our experience demonstrates that this approach to component software structure and reuse has a surprisingly large impact in the OS implementation domain. Four real-world examples show how the OSKit is catalyzing research and development in operating systems and programming languages.
Conference Paper
Full-text available
The OSGi framework is a Java-based, centralized, component oriented platform. It is being widely adopted as an execution environment for the development of extensible applications. However, current Java Virtual Machines are unable to isolate components from each other. For instance, a malicious component can freeze the complete platform by allocating too much memory or alter the behavior of other components by modifying shared variables. This paper presents I-JVM, a Java Virtual Machine that provides a lightweight approach to isolation while preserving compatibility with legacy OSGi applications. Our evaluation of I-JVM shows that it solves the 8 known OSGi vulnerabilities that are due to the Java Virtual Machine and that the overhead of I-JVM compared to the JVM on which it is based is below 20%.
Conference Paper
Full-text available
The Java language specification states that every access to an array needs to be within the bounds of that array, i.e. between 0 and array length - 1. Different techniques for different programming languages have been proposed to eliminate explicit bounds checks. Some of these techniques are implemented in off-the-shelf Java Virtual Machines (JVMs). The underlying principle of these techniques is that bounds checks can be removed when a JVM/compiler has enough information to guarantee that a sequence of accesses (e.g. inside a for-loop) is safe (within the bounds). Most of the techniques for the elimination of array bounds checks have been developed for programming languages that do not support multi-threading and/or enable dynamic class loading. These two characteristics make most of these techniques unsuitable for Java. Techniques developed specifically for Java have not addressed the elimination of array bounds checks in the presence of indirection; that is, when the index is stored in another array (indirection array). With the objective of optimizing applications with array indirection, this paper proposes and evaluates three implementation strategies, each implemented as a Java class. The classes provide the functionality of Java arrays of type int so that objects of the classes can be used instead of indirection arrays. Each strategy enables JVMs, when examining only one of these classes at a time, to obtain enough information to remove array bounds checks. Copyright © 2005 John Wiley & Sons, Ltd.
Conference Paper
Full-text available
According to conventional wisdom, interfaces provide flexibility at the cost of performance. Most high-performance Java virtual machines today tightly integrate their core virtual machines with their just-in-time compilers and garbage collectors to get the best performance. The Open Runtime Platform (ORP) is unusual in that it reconciles high performance with the extensive use of well-defined interfaces between its components. ORP was developed to support experiments in dynamic compilation, garbage collection, synchronization, and other technologies. To achieve this, two key interfaces were designed: one for garbage collection and another for just-in-time compilation. This paper describes some interesting features of these interfaces and discusses lessons learned in their use. One lesson we learned was to selectively expose small but frequently accessed data structures in our interfaces; this improves performance while minimizing the number of interface crossings.
Conference Paper
Full-text available
We present the fast subtype checking implemented in Sun's HotSpot JVM. Subtype checks occur when a program wishes to know if class S implements class T, where S and T are not both known at compile-time. Large Java programs will make millions or even billions of such checks, hence a fast check is essential. In actual benchmark runs our technique performs complete subtype checks in 3 instructions (and only 1 memory reference) essentially all the time. In rare instances it reverts to a slower array scan. Memory usage is moderate (11 words per class) and can be traded off for time. Class loading does not require recomputing any data structures associated with subtype checking.
Conference Paper
Full-text available
It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places}; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java™ languages.
Conference Paper
Full-text available
Today's web applications are pushing the limits of modern web browsers. The emergence of the browser as the platform of choice for rich client-side applications has shifted the use of in-browser JavaScript from small scripting programs to large computationally intensive application logic. For many web applications, JavaScript performance has become one of the bottlenecks preventing the development of even more interactive client side applications. While traditional just-in-time compilation is successful for statically typed virtual machine based languages like Java, compiling JavaScript turns out to be a challenging task. Many JavaScript programs and scripts are short-lived, and users expect a responsive browser during page loading. This leaves little time for compilation of JavaScript to generate machine code. We present a trace-based just-in-time compiler for JavaScript that uses run-time profiling to identify frequently executed code paths, which are compiled to executable machine code. Our approach increases execution performance by up to 116% by decomposing complex JavaScript instructions into a simple Forth-based representation, and then recording the actually executed code path through this low-level IR. Giving developers more computational horsepower enables a new generation of innovative web applications.
Conference Paper
Full-text available
The power of high-level languages lies in their abstraction over hardware and software complexity, leading to greater security, bet- ter reliability, and lower development costs. However, opaque ab- stractions are often show-stoppers for systems programmers, forc- ing them to either break the abstraction, or more often, simply give up and use a different language. This paper addresses the challenge of opening up a high-level language to allow practical low-level programming without forsaking integrity or performance. The contribution of this paper is three-fold: 1) we draw together common threads in a diverse literature, 2) we identify a frame- work for extending high-level languages for low-level program- ming, and 3) we show the power of this approach through con- crete case studies. Our framework leverages just three core ideas: extending semantics via intrinsic methods, extending types via un- boxing and architectural-width primitives, and controlling seman- tics via scoped semantic regimes. We develop these ideas through the context of a rich literature and substantial practical experience. We show that they provide the power necessary to implement sub- stantial artifacts such as a high-performance virtual machine, while preserving the software engineering benefits of the host language. The time has come for high-level low-level programming to be taken more seriously: 1) more projects now use high-level lan- guages for systems programming, 2) increasing architectural het- erogeneity and parallelism heighten the need for abstraction, and 3) a new generation of high-level languages are under development and ripe to be influenced.
Conference Paper
Full-text available
Two major efficiency parameters for garbage collectors are the throughput overheads and the pause times that they introduce. Highly responsive systems need to use collectors with as short as possible pause times. Pause lengths have decreased significantly during the years, especially through the use of concurrent garbage collectors. For modern concurrent collectors, the longest pause is typically created by the need to atomically scan the runtime stack. All practical concurrent collectors that we are aware of must ob- tain a snapshot of the pointers on each thread's runtime stack, in order to reclaim objects correctly. To further reduce the length of the collector pauses, incremental stack scans were proposed. However, previous such methods employ locks to stop the mutator from accessing a stack frame while it is being scanned. Thus, these methods introduce a potential long and unpredictable pauses for a mutator thread. In this work we propose the first concurrent, in- cremental, and lock-free stack scanning for garbage collectors, al- lowing high responsiveness and support for programs that employ fine-synchronization to avoid locks. Our solution can be employed by all concurrent collectors that we are aware of, it is lock-free, it imposes a negligible overhead on the program execution, and it supports the special in-stack references existing in languages like C#.
Conference Paper
Full-text available
Modern garbage collectors rely on read and write barriers imposed on heap accesses by the mutator, to keep track of references between different regions of the garbage collected heap, and to synchronize actions of the mutator with those of the collector. It has been a long-standing untested assumption that barriers impose significant overhead to garbage-collected applications. As a result, researchers have devoted effort to development of optimization approaches for elimination of unnecessary barriers, or proposed new algorithms for garbage collection that avoid the need for barriers while retaining the capability for independent collection of heap partitions. On the basis of the results presented here, we dispel the assumption that barrier overhead should be a primary motivator for such efforts. We present a methodology for precise measurement of mutator overheads for barriers associated with mutator heap accesses. We provide a taxonomy of different styles of barrier and measure the cost of a range of popular barriers used for different garbage collectors within Jikes RVM. Our results demonstrate that barriers impose surprisingly low cost on the mutator, though results vary by architecture. We found that the average overhead for a reasonable generational write barrier was less than 2% on average, and less than 6% in the worst case. Furthermore, we found that the average overhead of a read barrier consisting of just an unconditional mask of the low order bits read on the PowerPC was only 0.85%, while on the AMD it was 8.05%. With both read and write barriers, we found that second order locality effects were sometimes more important than the overhead of the barriers themselves, leading to counter-intuitive speedups in a number of situations.
Conference Paper
Full-text available
Tools supporting dynamic code generation tend too be low-level (leaving much work to the client ap- plication) or too intimately related with the lan- guage/system in which they are used (making them unsuitable for casual reuse). Applications or vir- tual machines wanting to benefit from runtime code generation are therefore forced to implement much of the compilation chain for themselves even when they make use of the available tools. The VPU is an fast, high-level code generation utility that performs most of the complex tasks related to code genera- tion, including register allocation, and which pro- duces good-quality C ABI-compliant native code. In the simplest cases, adding VPU-based runtime code generation to an application requires just a few lines of additional code—and for a typical virtual ma- chine, VPU-based just-in-time compilation requires only a few lines of code per virtual instruction.
Conference Paper
Full-text available
This paper describes optimization techniques recently applied to the Just-In-Time compilers that are part of the IBM® Developer Kit for JavaTM and the J9 Java virtual machine specification. It focusses primarily on those optimizations that improved server and middleware performance. Large server and middleware applications written in the Java programming language present a variety of performance challenges to virtual machines (VMs) and justin-time (JIT) compilers; we must address not only steady-state performance but also start-up time. In this paper, we describe 12 optimizations that have been implemented in IBM products because they improve the performance and scalability of these types of applications. These optimizations reduce, for example, the overhead of synchronization, object allocation, and some commonly used Java class library calls. We also describe techniques to address server start-up time, such as recompilation strategies. The experimental results show that the optimizations we discuss in this paper improve the performance of applications such as SPECjbb2000 and SPECjAppServer2002 by as much as 10-15%.
Conference Paper
Full-text available
A high-performance implementation of a Java Virtual Machine (JVM) consists of efficient implementation of Just-In-Time (JIT) compilation, exception handling, synchronization mechanism, and garbage collection (GC). These components are tightly coupled to achieve high performance. In this paper, we present some static anddynamic techniques implemented in the JIT compilation and exception handling of the Microprocessor Research Lab Virtual Machine (MRL VM), i.e., lazy exceptions, lazy GC mapping, dynamic patching, and bounds checking elimination. Our experiments used IA-32 as the hardware platform, but the optimizations can be generalized to other architectures.
Article
Full-text available
The Java language specification states that every access to an array needs to be within the bounds of that array; i.e. between 0 and length - 1. Different techniques for different programming languages have been proposed to eliminate explicit bounds checks. Some of these techniques are implemented in off-the-self Java Virtual Machines (JVMs). The underlying principle of these techniques is that bounds checks can be removed when a JVM/compiler has enough information to guarantee that a sequence of accesses (e.g. inside a for-loop) is safe (within the bounds).
Conference Paper
Full-text available
Transactional memory (TM) has recently emerged as an effective tool for extracting fine-grain parallelism from declarative critical sections. In order to make STM systems practical, significant effort has been made to integrate transactions intoexisting programming languages. Unfortunately, existing approaches fail to provide a simple implementation that permits lock-basedand transaction-based abstractions to coexist seamlessly. Because of the fundamental semantic differences between locks andtransactions, legacy applications or libraries written using locks can not be transparently used within atomic regions. Toaddress these shortcomings, we implement a uniform transactional execution environment for Java programs in which transactionscan be integrated with more traditional concurrency control constructs. Programmers can run arbitrary programs that utilizetraditional mutual-exclusion-based programming techniques, execute new programs written with explicit transactional constructs,and freely combine abstractions that use both coding styles.
Conference Paper
Full-text available
We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Article
Full-text available
Class loaders are a powerful mechanism for dynamically loading software components on the Java platform. They are unusual in supporting all of the following features: laziness, type-safe linkage, user-defined extensibility, and multiple communicating namespaces. We present the notion of class loaders and demonstrate some of their interesting uses. In addition, we discuss how to maintain type safety in the presence of user-defined dynamic class loading. 1 Introduction In this paper, we investigate an important feature of the Java virtual machine: dynamic class loading. This is the underlying mechanism that provides much of the power of the Java platform: the ability to install software components at runtime. An example of a component is an applet that is downloaded into a web browser. While many other systems [16] [13] also support some form of dynamic loading and linking, the Java platform is the only system we know of that incorporates all of the following features: 1. Lazy loadi...
Article
Full-text available
Single superclass inheritance enables simple and ecient table-driven virtual method dispatch. However, virtual method table dispatch does not handle multiple inheritance and interfaces. This complication has led to a widespread misimpression that interface method dispatch is inherently inecient. This paper argues that with proper implementation techniques, Java interfaces need not be a source of significant performance degradation. We present an efficient interface method dispatch mechanism, associating a fixed-sized interface method table (IMT) with each class that implements an interface. Interface method signatures hash to an IMT slot, with any hashing collisions handled by custom-generated conflict resolution stubs. The dispatch mechanism is efficient in both time and space. Furthermore, with static analysis and online profile data, an optimizing compiler can inline the dominant target(s) of any frequently executed interface call. Micro-benchmark results demonstrate that the expected cost of an interface method call dispatched via an IMT is comparable to the cost of a virtual method call. Experimental evaluation of a number of interface dispatch mechanisms on a suite of larger applications demonstrates that, even for applications that make only moderate use of interface methods, the choice of interface dispatching mechanism can significantly impact overall performance. Fortunately, several mechanisms provide good performance at a modest space cost.
Conference Paper
Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. This paper describes a collector family, called mark-region , and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement immix , a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and reclaim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines . We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design.
Article
To guarantee typesafe execution, Java and other strongly typed languages require bounds checking of array accesses. Because array-bounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy elimination or instruction scheduling of memory operations. Furthermore, because it is not expressible at bytecode level, the elimination of bounds checks can only be performed at run time , after the bytecode program is loaded. Using existing powerful bounds-check optimizers at run time is not feasible, however, because they are too heavyweight for the dynamic compilation setting. ABCD is a light-weight algorithm for elimination of Array Bounds Checks on Demand. Its design emphasizes simplicity and efficiency. In essence, ABCD works by adding a few edges to the SSA value graph and performing a simple traversal of the graph. Despite its simplicity, ABCD is surprisingly powerful. On our benchmarks, ABCD removes on average 45% of dynamic bound check instructions, sometimes achieving near-ideal optimization. The efficiency of ABCD stems from two factors. First, ABCD works on a sparse representation. As a result, it requires on average fewer than 10 simple analysis steps per bounds check. Second, ABCD is demand-driven . It can be applied to a set of frequently executed (hot) bounds checks, which makes it suitable for the dynamic-compilation setting, in which compile-time cost is constrained but hot statements are known.
Conference Paper
A high-performance implementation of a Java Virtual Machine (JVM) consists of efficient implementation of Just-In-Time (JIT) compilation, exception handling, synchronization mechanism, and garbage collection (GC). These components are tightly coupled to achieve high performance. In this paper, we present some static anddynamic techniques implemented in the JIT compilation and exception handling of the Microprocessor Research Lab Virtual Machine (MRL VM), i.e., lazy exceptions, lazy GC mapping, dynamic patching, and bounds checking elimination. Our experiments used IA-32 as the hardware platform, but the optimizations can be generalized to other architectures.
Conference Paper
Software evolves to fix bugs and add features. Stopping and restarting programs to apply changes is inconvenient and often costly. Dynamic software updating (DSU) addresses this problem by updating programs while they execute, but existing DSU systems for managed languages do not support many updates that occur in practice and are inefficient. This paper presents the design and implementation of Jvolve, a DSU-enhanced Java VM. Updated programs may add, delete, and replace fields and methods anywhere within the class hierarchy. Jvolve implements these updates by adding to and coordinating VM classloading, just-in-time compilation, scheduling, return barriers, on-stack replacement, and garbage collection. Jvolve, is safe: its use of bytecode verification and VM thread synchronization ensures that an update will always produce type-correct executions. Jvolve is flexible: it can support 20 of 22 updates to three open-source programs--Jetty web server, JavaEmailServer, and CrossFTP server--based on actual releases occurring over 1 to 2 years. Jvolve is efficient: performance experiments show that incurs no overhead during steady-state execution. These results demonstrate that this work is a significant step towards practical support for dynamic updates in virtual machines for managed languages.
Conference Paper
The Multitasking Virtual Machine (called from now on simply MVM) is a modification of the Java™ virtual machine. It enables safe, secure, and scalable multitasking. Safety is achieved by strict isolation of applications from one another. Resource control mechanisms augment security by preventing some denial-ofservice attacks. Improved scalability results from an aggressive application of the main design principle of MVM: share as much of the runtime as possible among applications and replicate everything else. The system can be described as a 'no compromise' approach -- all the known APIs and mechanisms of the Java programming language are available to applications. MVM is implemented as a series of carefully tuned modifications to the Java HotSpot™ virtual machine, including the dynamic compiler. This paper presents the design of MVM, focusing on several novel and general techniques: an in-runtime design of lightweight isolation, an extension of a copying, generational garbage collector to provide best-effort management of a portion of the heap space, and a transparent and automated mechanism for safe execution of user-level native code. MVM demonstrates that multitasking in a safe language can be accomplished with a high degree of protection, without constraining the language, and with competitive performance characteristics.
Conference Paper
Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.
Article
The multitasking virtual machine (called from now on simply MVM) is a modification of the Java virtual machine. It enables safe, secure, and scalable multitasking. Safety is achieved by strict isolation of application from one another. Resource control augment security by preventing some denial-of-service attacks. Improved scalability results from an aggressive application of the main design principle of MVM: share as much of the runtime as possible among applications and replicate everything else. The system can be described as a 'no compromise'approach --- all the known APIs and mechanisms of the Java programming language are available to applications. MVM is implemented as a series of carefully tuned modifications to the Java HotSpot virtual machine, including the dynamic compiler. this paper presents the design of MVM, focusing on several novel and general techniques: an in-runtime design of lightweight isolation, an extension of a copying, generational garbage collector to provide best-effort management of a portion of the heap space, and a transparent and automated mechanism for safe execution of user-level native code. MVM demonstrates that multitasking in a safe language can be accomplished with a high degree of protection, without constraining the language, and and with competitive performance characteristics
Conference Paper
One of the most distinctive features of the JavaTM programming language is the ability to specify class loading policies. Despite the popularity of class loaders, little has been done to reduce the cost associated with defining the same class by multiple loaders. In particular, implementations of the Java virtual machine (JVMTM ) create a complete runtime representation of each class regardless of how many class loaders already define the same class. This lack of sharing leads to poor memory utilization and to replicated run-time work. Recent efforts achieve some degree of sharing only when dynamic binding behaves predictably across loaders. This limits sharing to class loaders whose behavior is fully controlled by the JVM. As a result applications that implement their own class loading policies cannot enjoy the benefit of sharing. We present a novel technique for sharing the runtime representation of classes (including bytecodes and, under some conditions, compiled code) across arbitrary user-defined class loaders. We describe how our approach is applied to the multi-tasking virtual machine (MVM). The new multi-tasking virtual machine retains the fast start-up time of the original MVM while extending the scope of footprint savings to applications that exploit user-defined class loaders.
Conference Paper
.We present the notion of class loaders and demonstrate some of their interesting uses. In addition, we discuss how to maintain type safety in the presence of user-defined dynamic class loading.
Conference Paper
Software evolves to fix bugs and add features. Stopping and restarting programs to apply changes is inconvenient and often costly. Dynamic software updating (DSU) addresses this problem by updating programs while they execute, but existing DSU systems for managed languages do not support many updates that occur in practice and are inefficient. This paper presents the design and implementation of Jvolve, a DSU-enhanced Java VM. Updated programs may add, delete, and replace fields and methods anywhere within the class hierarchy. Jvolve implements these updates by adding to and coordinating VM classloading, just-in-time compilation, scheduling, return barriers, on-stack replacement, and garbage collection. Jvolve, is safe: its use of bytecode verification and VM thread synchronization ensures that an update will always produce type-correct executions. Jvolve is flexible: it can support 20 of 22 updates to three open-source programs--Jetty web server, JavaEmailServer, and CrossFTP server--based on actual releases occurring over 1 to 2 years. Jvolve is efficient: performance experiments show that incurs no overhead during steady-state execution. These results demonstrate that this work is a significant step towards practical support for dynamic updates in virtual machines for managed languages.
Conference Paper
To guarantee typesafe execution, Java and other strongly typed languages require bounds checking of array accesses. Because array-bounds checks may raise exceptions, they block code motion of instructions with side effects, thus preventing many useful code optimizations, such as partial redundancy elimination or instruction scheduling of memory operations. Furthermore, because it is not expressible at bytecode level, the elimination of bounds checks can only be performed at run time, after the bytecode program is loaded. Using existing powerful bounds-check optimizers at run time is not feasible, however, because they are too heavyweight for the dynamic compilation setting.ABCD is a light-weight algorithm for elimination of Array Bounds Checks on Demand. Its design emphasizes simplicity and efficiency. In essence, ABCD works by adding a few edges to the SSA value graph and performing a simple traversal of the graph. Despite its simplicity, ABCD is surprisingly powerful. On our benchmarks, ABCD removes on average 45% of dynamic bound check instructions, sometimes achieving near-ideal optimization. The efficiency of ABCD stems from two factors. First, ABCD works ona sparse representation. As a result, it requires on average fewer than 10 simple analysis steps per bounds check. Second, ABCD is demand-driven. It can be applied to a set of frequently executed (hot) bounds checks, which makes it suitable for the dynamic-compilation setting, in which compile-time cost is constrained but hot statements are known.
Conference Paper
Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. This paper describes a collector family, called mark-region, and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and reclaim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design.
Article
SUMMARY Dynamic flexibility is a major challenge in modern system design to react to context or applicative requirements evolutions. Adapting behaviors may impose substantial code modification across the whole system, in the field, without service interruption, and without state loss. This paper presents the JnJVM, a full Java virtual machine (JVM) that satisfies these needs by using dynamic aspect weaving techniques and a component architecture. It supports adding or replacing its own code, while it is running, with no overhead on unmodified code execution. Our measurements reveal similar performance when compared to the monolithic JVM Kaffe. Three illustrative examples show different extension scenarios: (i) modifying the JVMs behavior; (ii) adding capabilities to the JVM; and (iii) modifying applications behavior.
Conference Paper
The development of a complete Java Virtual Machine (JVM) implementation is a tedious process which involves knowledge in different areas: garbage collection, just in time compilation, interpretation, file parsing, data structures, etc. The result is that developing its own virtual machine requires a considerable amount of man/year. In this paper we show that one can implement a JVM with third party software and with performance comparable to industrial and top open-source JVMs on scientific applications. Our proof-of-concept implementation uses existing versions of a garbage collector, a just in time compiler, and the base library, and is robust enough to execute complex Java applications such as the OSGi Felix implementation and the Tomcat servlet container.
Article
Because software systems are imperfect, developers are forced to fix bugs and add new features. The common way of applying changes to a running system is to stop the application or machine and restart with the new version. Stopping and restarting causes a disruption in service that is at best inconvenient and at worst causes revenue loss and compromises safety. Dynamic software updating (DSU) addresses these problems by updating programs while they execute. Prior DSU systems for managed languages like Java and C# lack necessary functionality: they are inefficient and do not support updates that occur commonly in practice. This dissertation presents the design and implementation of Jvolve, a DSU system for Java. Jvolve's combination of flexibility, safety, and efficiency is a significant advance over prior approaches. Our key contribution is the extension and integration of existing Virtual Machine services with safe, flexible, and efficient dynamic updating functionality. Our approach is flexible enough to support a large class of updates, guarantees type-safety, and imposes no space or time overheads on steady-state execution. Jvolve supports many common updates. Users can add, delete, and change existing classes. Changes may add or remove fields and methods, replace existing ones, and change type signatures. Changes may occur at any level of the class hierarchy. To initialize new fields and update existing ones, Jvolve applies class and object transformer functions, the former for static fields and the latter for object instance fields. These features cover many updates seen in practice. Jvolve supports 20 of 22 updates to three open-source programs---Jetty web server, JavaEmailServer, and CrossFTP server---based on actual releases occurring over a one to two year period. This support is substantially more flexible than prior systems. Jvolve is safe. It relies on bytecode verification to statically type-check updated classes. To avoid dynamic type errors due to the timing of an update, Jvolve stops the executing threads at a DSU safe point and then applies the update. DSU safe points are a subset of VM safe points, where it is safe to perform garbage collection and thread scheduling. DSU safe points further restrict the methods that may be on each thread's stack, depending on the update. Restricted methods include updated methods for code consistency and safety, and user-specified methods for semantic safety. Jvolve installs return barriers and uses on-stack replacement to speed up reaching a safe point when necessary. While Jvolve does not guarantee that it will reach a DSU safe point, in our multithreaded benchmarks it almost always does. Jvolve includes a tool that automatically generates default object transformers which initialize new and changed fields to default values and retain values of unchanged fields in heap objects. If needed, programmers may customize the default transformers. Jvolve is the first dynamic updating system to extend the garbage collector to identify and transform all object instances of updated types. This dissertation introduces the concept of object-specific state transformers to repair application heap state for certain classes of bugs that corrupt part of the heap, and a novel methodology that employes dynamic analysis to automatically generate these transformers. Jvolve's eager object transformation design and implementation supports the widest class of updates to date. Finally, Jvolve is efficient. It imposes no overhead during steady-state execution. During an update, it imposes overheads to classloading and garbage collection. After an update, the adaptive compilation system will incrementally optimize the updated code in its usual fashion. Jvolve is the first full-featured dynamic updating system that imposes no steady-state overhead. In summary, Jvolve is the most-featured, most flexible, safest, and best-performing dynamic updating system for Java and marks a significant step towards practical support for dynamic updates in managed language virtual machines.
Conference Paper
Increasingly popular languages such as Java and C# require efficient garbage collection. This paper presents the design, implementation, and evaluation of MMTk, a Memory Management Toolkit for and in Java. MMTk is an efficient, composable, extensible, and portable framework for building garbage collectors. MMTk uses design patterns and compiler cooperation to combine modularity and efficiency. The resulting system is more robust, easier to maintain, and has fewer defects than monolithic collectors. Experimental comparisons with monolithic Java and C implementations reveal MMTk has significant performance advantages as well. Performance critical system software typically uses monolithic C at the expense of flexibility. Our results refute common wisdom that only this approach attains efficiency, and suggest that performance critical software can embrace modular design and high-level languages.
Article
Language-supported synchronization is a source of serious performance problems in many Java programs. Even singlethreaded applications may spend up to half their time performing useless synchronization due to the thread-safe nature of the Java libraries. We solve this performance problem with a new algorithm that allows lock and unlock operations to be performed with only a few machine instructions in the most common cases. Our locks only require a partial word per object, and were implemented without increasing object size. We present measurements from our implementation in the JDK 1.1.2 for AIX, demonstrating speedups of up to a factor of 5 in micro-benchmarks and up to a factor of 1.7 in real programs. 1 Introduction Monitors [5] are a language-level construct for providing mutually exclusive access to shared data structures in a multithreaded environment. However, the overhead required by the necessary locking has generally restricted their use to relatively "heavy-weight" object...
Article
We present a method for adapting garbage collectors designed to run sequentially with the client, so that they may run concurrently with it. We rely on virtual memory hardware to provide information about pages that have been updated or "dirtied" during a given period of time. This method has been used to construct a mostly parallel trace-and-sweep collector that exhibits very short pause times. Performance measurements are given. 1. Introduction Garbage collection is an important feature of many modern computing environments. There are basically two styles of garbage collection algorithms: reference-counting collectors and tracing collectors. In this paper we consider only tracing collectors. A straightforward implementation of tracing collection prevents any client action from occurring while the tracing operation is performed. When applied to a system with a large heap, such stop-the-world implementations cause long pauses. One of the primary arguments against wide adoption of gar...
Article
We describe a new algorithm for fast global register allocation called linear scan. This algorithm is not based on graph coloring, but allocates registers to variables in a single linear-time scan of the variables' live ranges. The linear scan algorithm is considerably faster than algorithms based on graph coloring, is simple to implement, and results in code that is almost as efficient as that obtained using more complex and time-consuming register allocators based on graph coloring. The algorithm is of interest in applications where compile time is a concern, such as dynamic compilation systems, "just-in-time" compilers, and interactive development environments.
Building a multi-language interpreter engine
  • D Sugalski
D. Sugalski. Building a multi-language interpreter engine. In International Python Conference, Feb. 2002.
The Microsoft shared source CLI implementation
  • D Stutz
D. Stutz. The Microsoft shared source CLI implementation. Technical report, Microsoft, Mar. 2002. http://msdn.microsoft.com/en- us/library/ms973879.aspx.
Parley: Federated virtual machines
  • P Cheng
  • D Grove
  • M Hirzel
  • R O Callahan
  • N Swamy
P. Cheng, D. Grove, M. Hirzel, R. O'Callahan, and N. Swamy. Parley: Federated virtual machines. In Workshop on the Future of Virtual Execution Environments, Sept. 2004.
Common Language Infrastructure (CLI) 4th Edition. ECMA International. Common Language Infrastructure (CLI)
  • Ecma International