ACM SIGPLAN Notices

Published by Association for Computing Machinery
Online ISSN: 0362-1340
Publications
Conference Paper
A certified binary is a value together with a proof that the value satisfies a given specification. Existing compilers that generate certified code have focused on simple memory and control-flow safety rather than more advanced properties. In this paper, we present a general framework for explicitly representing complex propositions and proofs in typed intermediate and assembly languages. The new framework allows us to reason about certified programs that involve effects while still maintaining decidable typechecking. We show how to integrate an entire proof system (the calculus of inductive constructions) into a compiler intermediate language and how the intermediate language can undergo complex transformations (CPS and closure conversion) while preserving proofs represented in the type system. Our work provides a foundation for the process of automatically generating certified binaries in a type-theoretic framework.
 
Conference Paper
The results presented show a novel route for the preparation of thin ultra-low-k polymer films based on commercial and "non-exotic" (non-expensive) polyimide by a foaming technique. Dependent on the glass transition temperature of the polyimide mechanically and thermally stable (> 300 °C) films having porosities of ca. 40 % and k-values below 2.0 are formed. A further reduction into the ultra low k region may be accomplished by tailoring the shape of the pores from spherical into disc-like voids
 
Article
The technique of abstracting abstract machines (AAM) provides a systematic approach for deriving computable approximations of evaluators that are easily proved sound. This article contributes a complementary step-by-step process for subsequently going from a naive analyzer derived under the AAM approach, to an efficient and correct implementation. The end result of the process is a two to three order-of-magnitude improvement over the systematically derived analyzer, making it competitive with hand-optimized implementations that compute fundamentally less precise results.
 
Article
The design and implementation of static analyzers have becoming increasingly systematic. In fact, for large classes of analyzers, design and implementation have remained seemingly (and now stubbornly) on the verge of full mechanization for several years. A stumbling block in full mechanization has been the ad hoc nature of soundness proofs accompanying each analyzer. While design and implementation is largely systematic, soundness proofs can change significantly with seemingly minor changes to the semantics or analyzers. An achievement of this work is to systematize, parameterize and modularize the proofs of soundness, so as to make them composable across analytic properties. We solve the problem of systematically constructing static analyzers by introducing Galois transformers: monad transformers that transports Galois connection properties. In concert with a monadic interpreter, we define a library of monad transformers that implement building blocks for classic analysis parameters like context-, path-, and heap- (in-)sensitivity. Moreover, these can be composed together independent of the language being analyzed. Significantly, a Galois transformer can be proved sound once and for all, making it a reusable analysis component. As new analysis features and abstractions are developed and mixed in, soundness proofs need not be reconstructed, as the composition of a monad transformer stack is sound by virtue of its constituents. Galois transformers provide a viable foundation for reusable and composable metatheory for program analysis. Finally, these Galois transformers shift the level of abstraction in analysis design and implementation to a level where non-specialists have the ability to synthesize sound analyzers over a number of parameters.
 
Upper bound on the probability that the computed probability exceeds the real value by more than t, for t = 0.01.
Numbers of iterations necessary to achieve a probability of false report on the same order of magnitude as the error margin.
The Gaussian normal distribution centered on 0, with standard deviate 1.  
Article
We introduce a new method, combination of random testing and abstract interpretation, for the analysis of programs featuring both probabilistic and non-probabilistic nondeterminism. After introducing "ordinary" testing, we show how to combine testing and abstract interpretation and give formulas linking the precision of the results to the number of iterations. We then discuss complexity and optimization issues and end with some experimental results.
 
Article
We introduce a novel variant of logical relations that maps types not merely to partial equivalence relations on values, as is commonly done, but rather to a proof-relevant generalisation thereof, namely setoids. The objects of a setoid establish that values inhabit semantic types, whilst its morphisms are understood as proofs of semantic equivalence. The transition to proof-relevance solves two well-known problems caused by the use of existential quantification over future worlds in traditional Kripke logical relations: failure of admissibility, and spurious functional dependencies. We illustrate the novel format with two applications: a direct-style validation of Pitts and Stark's equivalences for "new" and a denotational semantics for a region-based effect system that supports type abstraction in the sense that only externally visible effects need to be tracked; non-observable internal modifications, such as the reorganisation of a search tree or lazy initialisation, can count as `pure' or `read only'. This `fictional purity' allows clients of a module soundly to validate more effect-based program equivalences than would be possible with traditional effect systems.
 
Article
We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine's machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique we call store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection. Comment: The 15th ACM SIGPLAN International Conference on Functional Programming (ICFP'10), Baltimore, Maryland, September, 2010
 
Conference Paper
separation logics are a family of extensions of Hoare logic for reasoning about programs that mutate memory. These logics are "abstract" because they are independent of any particular concrete memory model. Their assertion languages, called propositional abstract separation logics, extend the logic of (Boolean) Bunched Implications (BBI) in various ways. We develop a modular proof theory for various propositional abstract separation logics using cut-free labelled sequent calculi. We first extend the cut-fee labelled sequent calculus for BBI of Hou et al to handle Calcagno et al's original logic of separation algebras by adding sound rules for partial-determinism and cancellativity, while preserving cut-elimination. We prove the completeness of our calculus via a sound intermediate calculus that enables us to construct counter-models from the failure to find a proof. We then capture other propositional abstract separation logics by adding sound rules for indivisible unit and disjointness, while maintaining completeness and cut-elimination. We present a theorem prover based on our labelled calculus for these logics.
 
Original loop and approximation in term of invariant in accepting (double-lined) location, illustrating Prop. 2. 
Initial set of states X (dark gray), loop invariants Z (light gray) and Z ′ (medium gray) obtained in Ex. 14. 
Article
We present abstract acceleration techniques for computing loop invariants for numerical programs with linear assignments and conditionals. Whereas abstract interpretation techniques typically over-approximate the set of reachable states iteratively, abstract acceleration captures the effect of the loop with a single, non-iterative transfer function applied to the initial states at the loop head. In contrast to previous acceleration techniques, our approach applies to any linear loop without restrictions. Its novelty lies in the use of the Jordan normal form decomposition of the loop body to derive symbolic expressions for the entries of the matrix modeling the effect of η ≥ Ο iterations of the loop. The entries of such a matrix depend on η through complex polynomial, exponential and trigonometric functions. Therefore, we introduces an abstract domain for matrices that captures the linear inequality relations between these complex expressions. This results in an abstract matrix for describing the fixpoint semantics of the loop. Our approach integrates smoothly into standard abstract interpreters and can handle programs with nested loops and loops containing conditional branches. We evaluate it over small but complex loops that are commonly found in control software, comparing it with other tools for computing linear loop invariants. The loops in our benchmarks typically exhibit polynomial, exponential and oscillatory behaviors that present challenges to existing approaches. Our approach finds non-trivial invariants to prove useful bounds on the values of variables for such loops, clearly outperforming the existing approaches in terms of precision while exhibiting good performance.
 
Article
JSConTest introduced the notions of effect monitoring and dynamic effect inference for JavaScript. It enables the description of effects with path specifications resembling regular expressions. It is implemented by an offline source code transformation. To overcome the limitations of the JSConTest implementation, we redesigned and reimplemented effect monitoring by taking advantange of JavaScript proxies. Our new design avoids all drawbacks of the prior implementation. It guarantees full interposition; it is not restricted to a subset of JavaScript; it is self-maintaining; and its scalability to large programs is significantly better than with JSConTest. The improved scalability has two sources. First, the reimplementation is significantly faster than the original, transformation-based implementation. Second, the reimplementation relies on the fly-weight pattern and on trace reduction to conserve memory. Only the combination of these techniques enables monitoring and inference for large programs.
 
Article
Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size.
 
Article
Programming with dependent types is a blessing and a curse. It is a blessing to be able to bake invariants into the definition of data-types: we can finally write correct-by-construction software. However, this extreme accuracy is also a curse: a data-type is the combination of a structuring medium together with a special purpose logic. These domain-specific logics hamper any effort of code reuse among similarly structured data. In this paper, we exorcise our data-types by adapting the notion of ornament to our universe of inductive families. We then show how code reuse can be achieved by ornamenting functions. Using these functional ornament, we capture the relationship between functions such as the addition of natural numbers and the concatenation of lists. With this knowledge, we demonstrate how the implementation of the former informs the implementation of the latter: the user can ask the definition of addition to be lifted to lists and she will only be asked the details necessary to carry on adding lists rather than numbers. Our presentation is formalised in a type theory with a universe of data-types and all our constructions have been implemented as generic programs, requiring no extension to the type theory.
 
Article
Identifying the hottest paths in the control flow graph of a routine can direct optimizations to portions of the code where most resources are consumed. This powerful methodology, called path profiling, was introduced by Ball and Larus in the mid 90s and has received considerable attention in the last 15 years for its practical relevance. A shortcoming of Ball-Larus path profiling was the inability to profile cyclic paths, making it difficult to mine interesting execution patterns that span multiple loop iterations. Previous results, based on rather complex algorithms, have attempted to circumvent this limitation at the price of significant performance losses already for a small number of iterations. In this paper, we present a new approach to multiple iterations path profiling, based on data structures built on top of the original Ball-Larus numbering technique. Our approach allows it to profile all executed paths obtained as a concatenation of up to k Ball-Larus acyclic paths, where k is a user-defined parameter. An extensive experimental investigation on a large variety of Java benchmarks on the Jikes RVM shows that, surprisingly, our approach can be even faster than Ball-Larus due to fewer operations on smaller hash tables, producing compact representations of cyclic paths even for large values of k.
 
Article
Though Ada and Modula-2 are not object-oriented languages, an object-oriented viewpoint is crucial for effective use of their module facilities. It is therefore instructive to compare the capabilities of a modular language such as Ada with an archetypal object-oriented language such as Smalltalk. The comparison is in terms of the basic properties of encapsulation, inheritance and binding, with examples given in both languages. This comparison highlights the strengths and weaknesses of both types of languages from an object-oriented perspective. It also provides a basis for the application of experience from Smalltalk and other object-oriented languages to increasingly widely used modular languages such as Ada and Modula-2.
 
Article
Sparse Matrix Vector multiplication (SpMV) is an important ker- nel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensive- ly in real applications. In this work we develop a Sparse Matrix- vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. S- MAT provides users with a unified programming interface in com- pressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse ma- trix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves im- pressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core pro- cessors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multi- grid solver from Hypre library with above 20% performance im- provement reported.
 
Article
Self-adjusting computation offers a language-based approach to writing programs that automatically respond to dynamically changing data. Recent work made significant progress in developing sound semantics and associated implementations of self-adjusting computation for high-level, functional languages. These techniques, however, do not address issues that arise for low-level languages, i.e., stack-based imperative languages that lack strong type systems and automatic memory management. In this paper, we describe techniques for self-adjusting computation which are suitable for low-level languages. Necessarily, we take a different approach than previous work: instead of starting with a high-level language with additional primitives to support self-adjusting computation, we start with a low-level intermediate language, whose semantics is given by a stack-based abstract machine. We prove that this semantics is sound: it always updates computations in a way that is consistent with full reevaluation. We give a compiler and runtime system for the intermediate language used by our abstract machine. We present an empirical evaluation that shows that our approach is efficient in practice, and performs favorably compared to prior proposals.
 
Article
We present an efficient algorithm to reduce the size of nondeterministic Buchi word automata, while retaining their language. Additionally, we describe methods to solve PSPACE-complete automata problems like universality, equivalence and inclusion for much larger instances (1-3 orders of magnitude) than before. This can be used to scale up applications of automata in formal verification tools and decision procedures for logical theories. The algorithm is based on new transition pruning techniques. These use criteria based on combinations of backward and forward trace inclusions. Since these relations are themselves PSPACE-complete, we describe methods to compute good approximations of them in polynomial time. Extensive experiments show that the average-case complexity of our algorithm scales quadratically. The size reduction of the automata depends very much on the class of instances, but our algorithm consistently outperforms all previous techniques by a wide margin. We tested our algorithm on Buchi automata derived from LTL-formulae, many classes of random automata and automata derived from mutual exclusion protocols, and compared its performance to the well-known automata tool GOAL.
 
Article
We propose algorithms for checking language equivalence of finite automata over a large alphabet. We use symbolic automata, where the transition function is compactly represented using (multi-terminal) binary decision diagrams (BDD). The key idea consists in computing a bisimulation by exploring reachable pairs symbolically, so as to avoid redundancies. This idea can be combined with already existing optimisations, and we show in particular a nice integration with the disjoint sets forest data-structure from Hopcroft and Karp's standard algorithm. Then we consider Kleene algebra with tests (KAT), an algebraic theory that can be used for verification in various domains ranging from compiler optimisation to network programming analysis. This theory is decidable by reduction to language equivalence of automata on guarded strings, a particular kind of automata that have exponentially large alphabets. We propose several methods allowing to construct symbolic automata out of KAT expressions, based either on Brzozowski's derivatives or on standard automata constructions. All in all, this results in efficient algorithms for deciding equivalence of KAT expressions.
 
Article
Homotopy Type Theory is a new field of mathematics based on the surprising and elegant correspondence between Martin-Lofs constructive type theory and abstract homotopy theory. We have a powerful interplay between these disciplines - we can use geometric intuition to formulate new concepts in type theory and, conversely, use type-theoretic machinery to verify and often simplify existing mathematical proofs. A crucial ingredient in this new system are higher inductive types, which allow us to represent objects such as spheres, tori, pushouts, and quotients. We investigate a variant of higher inductive types whose computational behavior is determined up to a higher path. We show that in this setting, higher inductive types are characterized by the universal property of being a homotopy-initial algebra.
 
Article
Language extensions of FORTRAN are being developed which permit the user to map data structures to the individual processors of distributed memory machines. These languages allow a programming style in which global data references are used. Current efforts are focussed on designing a common basis for such languages, the result of which is known as High Performance Fortran (HPF). One of the central debates in the HPF effort revolves around the concept of templates, introduced as an abstract index space to which data could be aligned. A model for the mapping of data which provides the functionality of High Performance Fortran distributions without the use of templates is presented.
 
Life cycle of a span 
Article
The problem of concurrent memory allocation is to find the right balance between temporal and spatial performance and scalability across a large range of workloads. Our contributions to address this problem are: uniform treatment of small and big objects through the idea of virtual spans, efficiently and effectively reclaiming unused memory through fast and scalable global data structures, and constant-time (modulo synchronization) allocation and deallocation operations that trade off memory reuse and spatial locality without being subject to false sharing. We have implemented an allocator, scalloc, based on these ideas that generally performs and scales in our experiments better than other allocators while using less memory and is still competitive otherwise.
 
Article
Studies of Aspect-Oriented Programming (AOP) usually focus on a language in which a specific aspect extension is integrated with a base language. Languages specified in this manner have a fixed, non-extensible AOP functionality. In this paper we consider the more general case of integrating a base language with a set of domain specific third-party aspect extensions for that language. We present a general mixin-based method for implementing aspect extensions in such a way that multiple, independently developed, dynamic aspect extensions can be subject to third-party composition and work collaboratively.
 
Article
Finding a denotational semantics for higher order quantum computation is a long-standing problem in the semantics of quantum programming languages. Most past approaches to this problem fell short in one way or another, either limiting the language to an unusably small finitary fragment, or giving up important features of quantum physics such as entanglement. In this paper, we propose a denotational semantics for a quantum lambda calculus with recursion and an infinite data type, using constructions from quantitative semantics of linear logic.
 
Article
Mechanism design is the study of algorithm design where the inputs to the algorithm are controlled by strategic agents, who must be incentivized to faithfully report them. Unlike typical programmatic properties, it is not sufficient for algorithms to merely satisfy the property, incentive properties are only useful if the strategic agents also believe this fact. Verification is an attractive way to convince agents that the incentive properties actually hold, but mechanism design poses several unique challenges: interesting properties can be sophisticated relational properties of probabilistic computations involving expected values, and mechanisms may rely on other probabilistic properties, like differential privacy, to achieve their goals. We introduce a relational refinement type system, called HOARe2, for verifying mechanism design and differential privacy. We show that HOARe2 is sound w.r.t. a denotational semantics, and correctly models (epsilon,delta)-differential privacy; moreover, we show that it subsumes DFuzz, an existing linear dependent type system for differential privacy. Finally, we develop an SMT-based implementation of HOARe2 and use it to verify challenging examples of mechanism design, including auctions and aggregative games, and new proposed examples from differential privacy.
 
Article
We propose a memory abstraction able to lift existing numerical static analyses to C programs containing union types, pointer casts, and arbitrary pointer arithmetics. Our framework is that of a combined points-to and data-value analysis. We abstract the contents of compound variables in a field-sensitive way, whether these fields contain numeric or pointer values, and use stock numerical abstract domains to find an overapproximation of all possible memory states--with the ability to discover relationships between variables. A main novelty of our approach is the dynamic mapping scheme we use to associate a flat collection of abstract cells of scalar type to the set of accessed memory locations, while taking care of byte-level aliases - i.e., C variables with incompatible types allocated in overlapping memory locations. We do not rely on static type information which can be misleading in C programs as it does not account for all the uses a memory zone may be put to. Our work was incorporated within the Astr\'{e}e static analyzer that checks for the absence of run-time-errors in embedded, safety-critical, numerical-intensive software. It replaces the former memory domain limited to well-typed, union-free, pointer-cast free data-structures. Early results demonstrate that this abstraction allows analyzing a larger class of C programs, without much cost overhead.
 
Article
We present a new method for automatically providing feedback for introductory programming problems. In order to use this method, we need a reference implementation of the assignment, and an error model consisting of potential corrections to errors that students might make. Using this information, the system automatically derives minimal corrections to student's incorrect solutions, providing them with a measure of exactly how incorrect a given solution was, as well as feedback about what they did wrong. We introduce a simple language for describing error models in terms of correction rules, and formally define a rule-directed translation strategy that reduces the problem of finding minimal corrections in an incorrect program to the problem of synthesizing a correct program from a sketch. We have evaluated our system on thousands of real student attempts obtained from the Introduction to Programming course at MIT (6.00) and MITx (6.00x). Our results show that relatively simple error models can correct on average 64% of all incorrect submissions in our benchmark set.
 
Conference Paper
This paper presents a formalized framework for defining corecursive functions safely in a total setting, based on corecursion up-to and relational parametricity. The end product is a general corecursor that allows corecursive (and even recursive) calls under well-behaved operations, including constructors. Corecursive functions that are well behaved can be registered as such, thereby increasing the corecursor's expressiveness. The metatheory is formalized in the Isabelle proof assistant and forms the core of a prototype tool. The corecursor is derived from first principles, without requiring new axioms or extensions of the logic.
 
Article
Hardware assistance has long been used for logic level and functional unit level hardware debugging, as well as for machine language level software debugging. Such hardware assistance includes probes to detect signals, comparators to identify matches with expected patterns, buffers to record selected events, and independent logic and software to analyze and interpret the observed events. It can also include the ability to generate selected signals to stimulate the object being debugged and the ability to isolate it from normal changes so its state can be examined. Through knowledge of the data structures and algorithms used by the operating systems, and the runtime representation, register usage, and code bursts produced by compilers, it is possible to take advantage of such hardware assistance in high-level debugging. High-level debugging here refers to debugging in terms of abstractions supported by the operating system and programming languages, as well as user defined abstractions built on top of these. This paper discusses design considerations behind a project to build such a hardware assisted high-level debugger.
 
Article
We consider the problem of specifying and verifying cryptographic security protocols for XML web services. The security specification WS-Security describes a range of XML security elements, such as username tokens, public-key certificates, and digital signatures, amounting to a flexible vocabulary for expressing protocols. To describe the syntax of these elements, we extend the usual XML data model with symbolic representations of cryptographic values. We use predicates on this data model to describe the semantics of security elements and of sample protocols distributed with the Microsoft WSE implementation of WS-Security. By embedding our data model within Abadi and Fournet's applied pi calculus, we formulate and prove security properties with respect to the standard Dolev–Yao threat model. Moreover, we informally discuss issues not addressed by the formal model. To the best of our knowledge, this is the first approach to the specification and verification of security protocols based on a faithful account of the XML wire format.
 
Article
Most programming languages use monitors with explicit signals for synchronization in shared-memory programs. Requiring programmers to signal threads explicitly results in many concurrency bugs due to missed notifications, or notifications on wrong condition variables. In this paper, we describe an implementation of an automatic signaling monitor in Java called AutoSynch that eliminates such concurrency bugs by removing the burden of signaling from the programmer. We show that the belief that automatic signaling monitors are prohibitively expensive is wrong. For most problems, programs based on AutoSynch are almost as fast as those based on explicit signaling. For some, AutoSynch is even faster than explicit signaling because it never uses signalAll, whereas the programmers end up using signalAll with the explicit signal mechanism. AutoSynch} achieves efficiency in synchronization based on three novel ideas. We introduce an operation called closure that enables the predicate evaluation in every thread, thereby reducing context switches during the execution of the program. Secondly, AutoSynch avoids signalAll by using a property called relay invariance that guarantees that whenever possible there is always at least one thread whose condition is true which has been signaled. Finally, AutoSynch uses a technique called predicate tagging to efficiently determine a thread that should be signaled. To evaluate the efficiency of AutoSynch, we have implemented many different well-known synchronization problems such as the producers/consumers problem, the readers/writers problems, and the dining philosophers problem. The results show that AutoSynch is almost as efficient as the explicit-signal monitor and even more efficient for some cases.
 
Different levels of approximation for the Sobel benchmark 
Execution time, energy and quality of results for DCT under different runtime policies and degrees of approximation. The accurate execution and the approximate execution using perforation are visualized as lines.
Different levels of perforation for the Sobel benchmark. Accurate execution, Perforation of 20%, 70% and 100% of loop iterations on the upper left, upper right, lower left and lower right quadrants respectively. 
Article
Reducing energy consumption is one of the key challenges in computing technology. One factor that contributes to high energy consumption is that all parts of the program are considered equally significant for the accuracy of the end-result. However, in many cases, parts of computations can be performed in an approximate way, or even dropped, without affecting the quality of the final output to a significant degree. In this paper, we introduce a task-based programming model and runtime system that exploit this observation to trade off the quality of program outputs for increased energy-efficiency. This is done in a structured and flexible way, allowing for easy exploitation of different execution points in the quality/energy space, without code modifications and without adversely affecting application performance. The programmer specifies the significance of tasks, and optionally provides approximations for them. Moreover, she provides hints to the runtime on the percentage of tasks that should be executed accurately in order to reach the target quality of results. The runtime system can apply a number of different policies to decide whether it will execute each individual less-significant task in its accurate form, or in its approximate version. Policies differ in terms of their runtime overhead but also the degree to which they manage to execute tasks according to the programmer's specification. The results from experiments performed on top of an Intel-based multicore/multiprocessor platform show that, depending on the runtime policy used, our system can achieve an energy reduction of up to 83% compared with a fully accurate execution and up to 35% compared with an approximate version employing loop perforation. At the same time, our approach always results in graceful quality degradation.
 
Article
Bidirectional typechecking, in which terms either synthesize a type or are checked against a known type, has become popular for its scalability (unlike Damas-Milner type inference, bidirectional typing remains decidable even for very expressive type systems), its error reporting, and its relative ease of implementation. Following design principles from proof theory, bidirectional typing can be applied to many type constructs. The principles underlying a bidirectional approach to polymorphism, however, are less obvious. We give a declarative, bidirectional account of higher-rank polymorphism, grounded in proof theory; this calculus enjoys many properties such as eta-reduction and predictability of annotations. We give an algorithm for implementing the declarative system; our algorithm is remarkably simple and well-behaved, despite being both sound and complete.
 
Conference Paper
Recent research suggests that the goal of fully automatic and reliable program generation for a broad range of applications is coming nearer to feasibility. However, several interesting and challenging problems remain to be solved before it becomes a reality. Solving them is also necessary, if we hope ever to elevate software engineering from its current state (a highly-developed handiwork) into a successful branch of engineering, capable of solving a wide range of new problems by systematic, well-automated and well-founded methods. We first discuss the relations between problem specifications and their solutions in program form, and then narrow the discussion to an important special case: program transformation. Although the goal of fully automatic program generation is still far from fully achieved, there has been some success in a special case: partial evaluation, also known as program specialization. A key problem in all program generation is termination of the generation process. This paper describes recent progress towards automatically solving the termination problem, first for individual programs, and then for specializers and “generating extensions,” the program generators that most offline partial evaluators produce. The paper ends with a list of challenging problems whose solution would bring the community closer to the goal of broad-spectrum, fully automatic and reliable program generation.
 
Article
Quantum cryptographic systems have been commercially available, with a striking advantage over classical systems that their security and ability to detect the presence of eavesdropping are provable based on the principles of quantum mechanics. On the other hand, quantum protocol designers may commit more faults than classical protocol designers since human intuition is poorly adapted to the quantum world. To offer formal techniques for modeling and verification of quantum protocols, several quantum extensions of process algebra have been proposed. An important issue in quantum process algebra is to discover a quantum generalization of bisimulation preserved by various process constructs, in particular, parallel composition, where one of the major differences between classical and quantum systems, namely quantum entanglement, is present. Quite a few versions of bisimulation have been defined for quantum processes in the literature, but in the best case they are only proved to be preserved by parallel composition of purely quantum processes where no classical communication is involved. Many quantum cryptographic protocols, however, employ the LOCC (Local Operations and Classical Communication) scheme, where classical communication must be explicitly specified. So, a notion of bisimulation preserved by parallel composition in the circumstance of both classical and quantum communication is crucial for process algebra approach to verification of quantum cryptographic protocols. In this article we introduce novel notions of strong bisimulation and weak bisimulation for quantum processes, and prove that they are congruent with respect to various process algebra combinators including parallel composition even when both classical and quantum communication are present. We also establish some basic algebraic laws for these bisimulations. In particular, we show the uniqueness of the solutions to recursive equations of quantum processes, which proves useful in verifying complex quantum protocols. To capture the idea that a quantum process approximately implements its specification, and provide techniques and tools for approximate reasoning, a quantified version of strong bisimulation, which defines for each pair of quantum processes a bisimulation-based distance characterizing the extent to which they are strongly bisimilar, is also introduced.
 
Node and thread terminology
The interesting part of the bound is Ω(CtT ∞ ). Figure 5 in [22] shows a DAG, as a building block of a worst-case computation, that can incur Ω(T ∞ ) deviations because of one touch. We can replace it with the DAG in Figure 2, which can incur Ω(CT ∞ ) additional cache misses due to one touch v (if the processor at a fork always chooses the parent thread to execute first), so that the worst-case computation in [22] can incur Ω(CtT ∞ ) additional cache misses because of t such touches. This DAG is similar to the DAG in Figure 6(a) in this paper. The proof of Theorem 10 shows how a parallel execution of this DAG incurs Ω(CT ∞ ) additional cache misses.
Figure (c) shows a DAG on which work stealing can incur Ω(P T 2 ∞ ) deviations and Ω(P T 2 ∞ ) additional cache misses. It uses the DAGs in (a) and (b) as building blocks.
a). In order to get the bound we want to prove, 
DAGs used by Figure 7 as building blocks.
Article
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create {futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Ω(P T∞ + t T∞) deviations, which implies Ω(C P T∞ + C t T∞) additional cache misses, where C is the number of cache lines, P is the number of processors, t is the number of touches, and T∞ is the computation span. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a later descendant of the thread that created it, then parallel executions with work stealing can incur at most O(C P T²∞) additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications.
 
Article
If the result of an expensive computation is invalidated by a small change to the input, the old result should be updated incrementally instead of reexecuting the whole computation. We incrementalize programs through their derivative. A derivative maps changes in the program's input directly to changes in the program's output, without reexecuting the original program. We present a program transformation taking programs to their derivatives, which is fully static and automatic, supports first-class functions, and produces derivatives amenable to standard optimization. We prove the program transformation correct in Agda for a family of simply-typed {\lambda}-calculi, parameterized by base types and primitives. A precise interface specifies what is required to incrementalize the chosen primitives. We investigate performance by a case study: We implement in Scala the program transformation, a plugin and improve performance of a nontrivial program by orders of magnitude.
 
Chapter
So-called "guarded commands" are introduced as a building block for alternative and repetitive constructs that allow non-deterministic program components for which at least the activity evoked, but possibly even the final state, is not necessarily uniquely determined by the initial state. For the formal derivation of programs expressed in terms of these constructs, a calculus will be shown.
 
Article
Increasing sharing in programs is desirable to compactify the code, and to avoid duplication of reduction work at run-time, thereby speeding up execution. We show how a maximal degree of sharing can be obtained for programs expressed as terms in the lambda calculus with letrec. We introduce a notion of 'maximal compactness' for λletrec-terms among all terms with the same infinite unfolding. Instead of defined purely syntactically, this notion is based on a graph semantics. λletrec-terms are interpreted as first-order term graphs so that unfolding equivalence between terms is preserved and reflected through bisimilarity of the term graph interpretations. Compactness of the term graphs can then be compared via functional bisimulation. We describe practical and efficient methods for the following two problems: transforming a λletrec-term into a maximally compact form; and deciding whether two λletrec-terms are unfolding-equivalent. The transformation of a λletrec-terms L into maximally compact form L0 proceeds in three steps: (i) translate L into its term graph G = [[L]] ; (ii) compute the maximally shared form of G as its bisimulation collapse G0 ; (iii) read back a λletrec-term L0 from the term graph G0 with the property [[L0]] = G0. Then L0 represents a maximally shared term graph, and it has the same unfolding as L. The procedure for deciding whether two given λletrec-terms L1 and L2 are unfolding-equivalent computes their term graph interpretations [[L1]] and [[L2]], and checks whether these are bisimilar. For illustration, we also provide a readily usable implementation.
 
Basic Flow Diagram. 
Article
Much effort is spent everyday by programmers in trying to reduce long, failing execution traces to the cause of the error. We present a new algorithm for error cause localization based on a reduction to the maximal satisfiability problem (MAX-SAT), which asks what is the maximum number of clauses of a Boolean formula that can be simultaneously satisfied by an assignment. At an intuitive level, our algorithm takes as input a program and a failing test, and comprises the following three steps. First, using symbolic execution, we encode a trace of a program as a Boolean trace formula which is satisfiable iff the trace is feasible. Second, for a failing program execution (e.g., one that violates an assertion or a post-condition), we construct an unsatisfiable formula by taking the trace formula and additionally asserting that the input is the failing test and that the assertion condition does hold at the end. Third, using MAX-SAT, we find a maximal set of clauses in this formula that can be satisfied together, and output the complement set as a potential cause of the error. We have implemented our algorithm in a tool called bug-assist for C programs. We demonstrate the surprising effectiveness of the tool on a set of benchmark examples with injected faults, and show that in most cases, bug-assist can quickly and precisely isolate the exact few lines of code whose change eliminates the error. We also demonstrate how our algorithm can be modified to automatically suggest fixes for common classes of errors such as off-by-one.
 
The same program in functional form (implicit closures). The lambda expressions are drawn outside their lexical environment to illustrate the analogy with the OO code. The number of environments out of the abstract interpretation is now O(N M ) because variables x and y in the rightmost lambda were not closed together and have different contexts.  
Article
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's $k$-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities. Recent results indicate that the relationship between the functional and object-oriented incarnations of $k$-CFA is not as well understood as thought. Van Horn and Mairson proved $k$-CFA for $k \geq 1$ to be EXPTIME-complete; hence, no polynomial-time algorithm can exist. Yet, there are several polynomial-time formulations of context-sensitive points-to analyses in object-oriented languages. Thus, it seems that functional $k$-CFA may actually be a profoundly different analysis from object-oriented $k$-CFA. We resolve this paradox by showing that the exact same specification of $k$-CFA is polynomial-time for object-oriented languages yet exponential- time for functional ones: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity and complexity. This illumination leads to an immediate payoff: by projecting the object-oriented treatment of objects onto closures, we derive a polynomial-time hierarchy of context-sensitive CFAs for functional programs.
 
Two portions P 1 and P 2 of a program obtained as the range between a node with several incoming edges and its immediate dominator
General workflow for deriving timings using OTAWA.
Article
In systems with hard real-time constraints, it is necessary to compute upper bounds on the worst-case execution time (WCET) of programs; the closer the bound to the real WCET, the better. This is especially the case of synchronous reactive control loops with a fixed clock; the WCET of the loop body must not exceed the clock period. We compute the WCET (or at least a close upper bound thereof) as the solution of an optimization modulo theory problem that takes into account the semantics of the program, in contrast to other methods that compute the longest path whether or not it is feasible according to these semantics. Optimization modulo theory extends satisfiability modulo theory (SMT) to maximization problems. Immediate encodings of WCET problems into SMT yield formulas intractable for all current production-grade solvers --- this is inherent to the DPLL(T) approach to SMT implemented in these solvers. By conjoining some appropriate "cuts" to these formulas, we considerably reduce the computation time of the SMT-solver. We experimented our approach on a variety of control programs, using the OTAWA analyzer both as baseline and as underlying microarchitectural analysis for our analysis, and show notable improvement on the WCET bound on a variety of benchmarks and control programs.
 
Article
We study bisimulation and context equivalence in a probabilistic $\lambda$-calculus. The contributions of this paper are threefold. Firstly we show a technique for proving congruence of probabilistic applicative bisimilarity. While the technique follows Howe's method, some of the technicalities are quite different, relying on non-trivial "disentangling" properties for sets of real numbers. Secondly we show that, while bisimilarity is in general strictly finer than context equivalence, coincidence between the two relations is attained on pure $\lambda$-terms. The resulting equality is that induced by Levy-Longo trees, generally accepted as the finest extensional equivalence on pure $\lambda$-terms under a lazy regime. Finally, we derive a coinductive characterisation of context equivalence on the whole probabilistic language, via an extension in which terms akin to distributions may appear in redex position. Another motivation for the extension is that its operational semantics allows us to experiment with a different congruence technique, namely that of logical bisimilarity.
 
Article
We define a language CQP (Communicating Quantum Processes) for modelling systems which combine quantum and classical communication and computation. CQP combines the communication primitives of the pi-calculus with primitives for measurement and transformation of quantum state; in particular, quantum bits (qubits) can be transmitted from process to process along communication channels. CQP has a static type system which classifies channels, distinguishes between quantum and classical data, and controls the use of quantum state. We formally define the syntax, operational semantics and type system of CQP, prove that the semantics preserves typing, and prove that typing guarantees that each qubit is owned by a unique process within a system. We illustrate CQP by defining models of several quantum communication systems, and outline our plans for using CQP as the foundation for formal analysis and verification of combined quantum and classical systems.
 
Article
This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection. The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possible-placement analysis, the communication selection transformation selects the “best” place for inserting the communication and determines if pipelining or blocking of communication should be performed. The framework has been implemented in the EARTH-McCAT optimizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel processor. These experiments show that the communication optimization can provide performance improvements of up to 16% over the unoptimized benchmarks.
 
Article
Writing accurate numerical software is hard because of many sources of unavoidable uncertainties, including finite numerical precision of implementations. We present a programming model where the user writes a program in a real-valued implementation and specification language that explicitly includes different types of uncertainties. We then present a compilation algorithm that generates a conventional implementation that is guaranteed to meet the desired precision with respect to real numbers. Our verification step generates verification conditions that treat different uncertainties in a unified way and encode reasoning about floating-point roundoff errors into reasoning about real numbers. Such verification conditions can be used as a standardized format for verifying the precision and the correctness of numerical programs. Due to their often non-linear nature, precise reasoning about such verification conditions remains difficult. We show that current state-of-the art SMT solvers do not scale well to solving such verification conditions. We propose a new procedure that combines exact SMT solving over reals with approximate and sound affine and interval arithmetic. We show that this approach overcomes scalability limitations of SMT solvers while providing improved precision over affine and interval arithmetic. Using our initial implementation we show the usefullness and effectiveness of our approach on several examples, including those containing non-linear computation.
 
Article
In this paper, we study the problem of generating inputs to a higher-order program causing it to error. We first study the problem in the setting of PCF, a typed, core functional language and contribute the first relatively complete method for constructing counterexamples for PCF programs. The method is relatively complete in the sense of Hoare logic; completeness is reduced to the completeness of a first-order solver over the base types of PCF. In practice, this means an SMT solver can be used for the effective, automated generation of higher-order counterexamples for a large class of programs. We achieve this result by employing a novel form of symbolic execution for higher-order programs. The remarkable aspect of this symbolic execution is that even though symbolic higher-order inputs and values are considered, the path condition remains a first-order formula. Our handling of symbolic function application enables the reconstruction of higher-order counterexamples from this first-order formula. After establishing our main theoretical results, we sketch how to apply the approach to untyped, higher-order, stateful languages with first-class contracts and show how counterexample generation can be used to detect contract violations in this setting. To validate our approach, we implement a tool generating counterexamples for erroneous modules written in Racket.
 
Article
We study the complexity of deciding the equality of infinite objects specified by systems of equations, and of infinite objects specified by λ-terms. For equational specifications there are several natural notions of equality: equality in all models, equality of the sets of solutions, and equality of normal forms for productive specifications. For λ-terms we investigate Böhm-tree equality and various notions of observational equality. We pinpoint the complexity of each of these notions in the arithmetical or analytical hierarchy. We show that the complexity of deciding equality in all models subsumes the entire analytical hierarchy. This holds already for the most simple infinite objects, viz. streams over {0,1}, and stands in sharp contrast to the low arithmetical ϖ⁰2-completeness of equality of equationally specified streams derived in [17] employing a different notion of equality.
 
Article
Compilation for embedded processors can be either aggressive (time consuming cross-compilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open promising directions for the design of new register allocation heuristics for embedded systems and especially for embedded compilation. In particular, heuristics based on tree scan with two separated phases -- one for spilling, then one for coloring/coalescing -- seem good candidates for designing memory-friendly, fast, and competitive register allocators. Still, also because of the side effect on power consumption, the minimization of loads and stores overhead (spilling problem) is an important issue. This paper provides an exhaustive study of the complexity of the ``spill everywhere'' problem in the context of the SSA form. Unfortunately, conversely to our initial hopes, many of the questions we raised lead to NP-completeness results. We identify some polynomial cases but that are impractical in JIT context. Nevertheless, they can give hints to simplify formulations for the design of aggressive allocators. Comment: 10 pages
 
Article
Lock-free data objects offer several advantages over their blocking counterparts, such as being immune to deadlocks and convoying and, more importantly, being highly concurrent. But they share a common disadvantage in that the operations they provide are difficult to compose into larger atomic operations while still guaranteeing lock-freedom. We present a lock-free methodology for composing highly concurrent linearizable objects together by unifying their linearization points. This makes it possible to relatively easily introduce atomic lock-free move operations to a wide range of concurrent objects. Experimental evaluation has shown that the operations originally supported by the data objects keep their performance behavior under our methodology.
 
Incremental computation of map in ADAPTON. 
IC of map with NOMINAL ADAPTON. 
the results of our list experiments. For each program (leftmost column), we consider a randomly 5 Hammer et al. (2014) report that for interactive, lazy usage patterns, ADAPTON substantially outperforms another state-of-the-art incremental technique, self-adjusting computation (SAC), which sometimes can incur significant slowdowns. We do not compare directly against SAC here. 
Article
Over the past thirty years, there has been significant progress in developing general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design challenge in such approaches is how to provide efficient incremental support for a broad range of programs. In this paper, we argue that first-class names are a critical linguistic feature for efficient incremental computation. Names identify computations to be reused across differing runs of a program, and making them first class gives programmers a high level of control over reuse. We demonstrate the benefits of names by presenting Nominal Adapton, an ML-like language for incremental computation with names. We describe how to use Nominal Adapton to efficiently incrementalize several standard programming patterns---including maps, folds, and unfolds---and show how to build efficient, incremental probabilistic trees and tries. Since Nominal Adapton's implementation is subtle, we formalize it as a core calculus and prove it is from-scratch consistent, meaning it always produces the same answer as simply re-running the computation. Finally, we demonstrate that Nominal Adapton can provide large speedups over both from-scratch computation and Adapton, a previous state-of-the-art incremental system.
 
Top-cited authors
Cormac Flanagan
  • University of California, Santa Cruz
Simon Loftus Peyton Jones
Saman P. Amarasinghe
  • Massachusetts Institute of Technology
David E. Culler
  • University of California, Berkeley
Rastislav Bodik
  • University of Washington Seattle