Conference Paper

Optimal Lambda Lifting in Quadratic Time

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The process of lambda lifting flattens a program by lifting all local function definitions to the global level. Optimal lambda lifting computes the minimal set of extraneous parameters needed by each function as is done by the O(n 3) equation-based algorithm proposed by Johnsson. In contrast, modern lambda lifting algorithms have used a graph-based approach to compute the set of extraneous parameters needed by each function. Danvy and Schultz proposed an algorithm that reduced the complexity of lambda lifting from O(n 3) to O(n 2). Their algorithm, however, is an approximation of optimal lambda lifting. Morazán and Mucha proposed an optimal graph-based algorithm at the expense of raising the complexity to O(n 3). Their algorithm, however, suggested that dominator trees might be used to develop an O(n 2) algorithm. This article explores the relationship between the call graph of a program, its dominator tree, and lambda lifting by developing algorithms for successively richer sets of programs. The result of this exploration is an O(n 2) optimal lambda lifting algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Although this runs in O(n 3 ) time, there were several a empts to achieve its optimality wrt. the minimal size of the required sets with be er asymptotics. As such, Morazán and Schultz (2008) were the rst to present an algorithm that simultaneously has optimal runtime in O(n 2 ) and computes minimal required sets. ...
... at begs the question whether the somewhat careless transformation in section 5 has one or both of the desirable optimality properties of the algorithm by Morazán and Schultz (2008). ...
... the minimal size of the required sets) with be er asymptotics. As such, Morazán and Schultz (2008) were the rst to present an algorithm that simultaneously has optimal runtime in O(n 2 ) and computes minimal required sets. In section 5.4 we compare to their approach. ...
Preprint
Lambda lifting is a well-known transformation, traditionally employed for compiling functional programs to supercombinators. However, more recent abstract machines for functional languages like OCaml and Haskell tend to do closure conversion instead for direct access to the environment, so lambda lifting is no longer necessary to generate machine code. We propose to revisit selective lambda lifting in this context as an optimising code generation strategy and conceive heuristics to identify beneficial lifting opportunities. We give a static analysis for estimating impact on heap allocations of a lifting decision. Performance measurements of our implementation within the Glasgow Haskell Compiler on a large corpus of Haskell benchmarks suggest modest speedups.
... Lambda Abstraction We also have to removed lambda abstractions, both those which are present in the original CIC terms and those which were introduced by η-expansion. To this end, we perform β-reduction wherever possible and remove remaining lambda-abstractions by lambda-lifting [29], [32]. Polymorphism and Higher-Order Thanks to the previous steps, all functions are now defined at top-level and totally 2 https://coq.inria.fr/refman/language/cic.html applied. ...
Preprint
The complexity of browsers has steadily increased over the years, driven by the continuous introduction and update of Web platform components, such as novel Web APIs and security mechanisms. Their specifications are manually reviewed by experts to identify potential security issues. However, this process has proved to be error-prone due to the extensiveness of modern browser specifications and the interplay between new and existing Web platform components. To tackle this problem, we developed WebSpec, the first formal security framework for the analysis of browser security mechanisms, which enables both the automatic discovery of logical flaws and the development of machine-checked security proofs. WebSpec, in particular, includes a comprehensive semantic model of the browser in the Coq proof assistant, a formalization in this model of ten Web security invariants, and a compiler turning the Coq model and the Web invariants into SMT-lib formulas. We showcase the effectiveness of WebSpec by discovering two new logical flaws caused by the interaction of different browser mechanisms and by identifying three previously discovered logical flaws in the current Web platform, as well as five in old versions. Finally, we show how WebSpec can aid the verification of our proposed changes to amend the reported inconsistencies affecting the current Web platform.
... The let-free program is then δ-reduced transforming expressions to normal form whenever possible. After δ-reduction, the program is transformed using λ-lifting [21,22]. After this step, all functions are defined at the top level. ...
Chapter
Full-text available
Compilers for functional languages are judged, in part, on how well they handle λ\lambda -expressions. The evaluation of λ\lambda -expressions traditionally requires closure allocations which can be intensive and can interact poorly with a garbage collector. Work on closure representation and garbage collection has successfully improved this interaction. This work, however, does not address the actual allocation of closures in the first place. This is important, because the only closures that do not have to be garbage collected are the closures that are never allocated. This article explores a novel mechanism to reduce flat-closure allocations based on memoization. To test this new mechanism, a compiler has been developed that uses continuation-passing style as an intermediate representation–which makes closure allocation ubiquitous. Empirical results strongly suggest that flat-closure memoization is an important optimization that significantly reduces running time as well as memory and closure allocation.
... For this we use an analysis obtaining the required variables for positions in a λ letrec -term as employed by algorithms for λ-lifting [32,14]. The term 'required variables' was coined by Morazán and Schultz [42]. A λ-variable x is called required at a position p in a λ letrec -term L if x is bound by an abstraction above p, and has a free occurrence in the complete unfolding of L below p (also function variables from above p are unfolded). ...
Article
Full-text available
We investigate the relationship between finite terms in {\lambda}-letrec, the {\lambda}-calculus with letrec, and the infinite {\lambda}-terms they express. We say that a lambda-letrec term expresses a lambda-term if the latter can be obtained as an infinite unfolding of the former. Unfolding is the process of substituting occurrences of function variables by the right-hand side of their definition. We consider the following questions: (i) How can we characterise those infinite {\lambda}-terms that are {\lambda}-letrec-expressible? (ii) Given two {\lambda}-letrec terms, how can we determine whether they have the same unfolding? (iii) Given a {\lambda}-letrec term, can we find a more compact version of the term with the same unfolding? To tackle these questions we introduce and study the following formalisms: (i) a rewriting system for unfolding {\lambda}-letrec terms into {\lambda}-terms (ii) a rewriting system for `observing' {\lambda}-terms by dissecting their term structure (iii) higher-order and first-order graph formalisms together with translations between them as well as translations from and to {\lambda}-letrec. We identify a first-order term graph formalism on which bisimulation preserves and reflects the unfolding semantics of {\lambda}-letrec and which is closed under functional bisimulation. From this we derive efficient methods to determine whether two terms are equivalent under infinite unfolding and to compute the maximally shared form of a given {\lambda}-letrec term.
... To be able to choose the prefixes correctly, such a translation must know for each function binding which lambda variables are 'required' on the right-hand side of its definition. An analysis for determining the set of 'required variables' is also used in a number of algorithms for lambda lifting [8,20,25], the last of which having coined the term. ...
Article
Increasing sharing in programs is desirable to compactify the code, and to avoid duplication of reduction work at run-time, thereby speeding up execution. We show how a maximal degree of sharing can be obtained for programs expressed as terms in the lambda calculus with letrec. We introduce a notion of 'maximal compactness' for λletrec-terms among all terms with the same infinite unfolding. Instead of defined purely syntactically, this notion is based on a graph semantics. λletrec-terms are interpreted as first-order term graphs so that unfolding equivalence between terms is preserved and reflected through bisimilarity of the term graph interpretations. Compactness of the term graphs can then be compared via functional bisimulation. We describe practical and efficient methods for the following two problems: transforming a λletrec-term into a maximally compact form; and deciding whether two λletrec-terms are unfolding-equivalent. The transformation of a λletrec-terms L into maximally compact form L0 proceeds in three steps: (i) translate L into its term graph G = [[L]] ; (ii) compute the maximally shared form of G as its bisimulation collapse G0 ; (iii) read back a λletrec-term L0 from the term graph G0 with the property [[L0]] = G0. Then L0 represents a maximally shared term graph, and it has the same unfolding as L. The procedure for deciding whether two given λletrec-terms L1 and L2 are unfolding-equivalent computes their term graph interpretations [[L1]] and [[L2]], and checks whether these are bisimilar. For illustration, we also provide a readily usable implementation.
Article
Graph-based intermediate representations (IRs) are widely used for powerful compiler optimizations, either interprocedurally in pure functional languages, or intraprocedurally in imperative languages. Yet so far, no suitable graph IR exists for aggressive global optimizations in languages with both effects and higher-order functions: aliasing and indirect control transfers make it difficult to maintain sufficiently granular dependency information for optimizations to be effective. To close this long-standing gap, we propose a novel typed graph IR combining a notion of reachability types with an expressive effect system to compute precise and granular effect dependencies at an affordable cost while supporting local reasoning and separate compilation. Our high-level graph IR imposes lexical structure to represent structured control flow and nesting, enabling aggressive and yet inexpensive code motion and other optimizations for impure higher-order programs. We formalize the new graph IR based on a λ-calculus with a reachability type-and-effect system along with a specification of various optimizations. We present performance case studies for tensor loop fusion, CUDA kernel fusion, symbolic execution of LLVM IR, and SQL query compilation in the Scala LMS compiler framework using the new graph IR. We observe significant speedups of up to 21 x .
Conference Paper
This article describes a new project to study the memory performance of different closure-implementation strategies in terms of memory allocation and runtime performance. At the heart of the project are four new implementation strategies for closures: three bytecode closures and memoized flat closures. The project proposes to compare the new implementation strategies to the classical strategy that dynamically allocates flat closures as heap data structures. The new bytecode closure representations are based on dynamically creating specialized bytecode instead of allocating a data structure. The first new strategy creates specialized functions by inlining the bindings of free variables. The second uses memoization to reduce the number of dynamically created functions. The third dynamically creates memoized specialized functions that treat free variables as parameters at runtime. The fourth memoizes flat closures. Empirical results from a preliminary byetcode-closure case-study using three small benchmarks are presented as a proof-of-concept. The data suggests that dynamically created bytecode closures in conjunction with memoization can allocate significantly less memory, as much as three orders of magnitude less memory in the presented benchmarks, than a flat closure implementation. In addition to studying the memory footprint of the different closure representations, the project will also compare runtime efficiency of these new strategies with traditional flat closures and flat closures that are unpacked onto the stack.
Thesis
Full-text available
Most computer programs are concurrent ones: they need to perform several tasks at the same time. Threads and events are two common techniques to implement concurrency. Events are generally more lightweight and efficient than threads, but also more difficult to use. Additionally, they are often not powerful enough; it is then necessary to write hybrid code, that uses both preemptively-scheduled threads and cooperatively-scheduled event handlers, which is even more complex. In this dissertation, we show that concurrent programs written in threaded style can be translated automatically into efficient, equivalent event-driven programs through a series of proven source-to-source transformations. We first propose Continuation-Passing C, an extension of the C programming language for writing concurrent systems that provides very lightweight, unified (cooperative and preemptive) threads. CPC programs are processed by the CPC translator to produce efficient sequentialized event-loop code, using native threads for the preemptive parts. We then define and prove the correctness of these transformations, in particular lambda lifting and CPS conversion, for an imperative language. Finally, we validate the design and implementation of CPC by comparing it to other thread librairies, and by exhibiting our Hekate BitTorrent seeder. We also justify the choice of lambda lifting by implementing eCPC, a variant of CPC using environments, and comparing its performances to CPC.
Conference Paper
Full-text available
We present a formal and general specification of lambda lift- ing and prove its correctness with respect to an operational semantics. Lambda lifting is a program transformation which eliminates free vari- ables from functions by introducing additional formal parameters to func- tion definition and additional actual parameters to function calls. This operation supports the transformation from a lexically-structured func- tional program into a set of recursive equations. Existing results provide specific algorithms with no flexibility, no general specification, and only limited correctness results. Our work provides a general specification of lambda lifting (and related operations) which supports flexible trans- lation strategies which may result in new implementation techniques. Our work also supports a simple framework in which the interaction of lambda lifting and other optimizations can be studied and from which new algorithms might be obtained.
Conference Paper
Full-text available
Priced timed (game) automata extends timed (game) automata with costs on both locations and transitions. In this paper we focus on reachability games for priced timed game automata and prove that the optimal cost for winning such a game is computable under conditions concerning the non-zenoness of cost. Under stronger conditions (strictness of constraints) we prove in addition that it is decidable whether there is an optimal strategy in which case an optimal strategy can be computed. Our results extend previous decidability result which requires the underlying game automata to be acyclic. Finally, our results are encoded in a first prototype in HyTech which is applied on a small case-study.
Conference Paper
Full-text available
Lambda-lifting is a program transformation used in compilers and in partial evaluators and that operates in cubic time. In this article, we show how to reduce this complexity to quadratic time. Lambda-lifting transforms a block-structured program into a set of recursive equations, one for each local function in the source program. Each equation carries extra parameters to account for the free variables of the corresponding local function and of all its callees. It is the search for these extra parameters that yields the cubic factor in the traditional formulation of lambda-lifting, which is due to Johnsson. This search is carried out by a transitive closure. Instead, we partition the call graph of the source program into strongly connected components, based on the simple observation that all functions in each component need the same extra parameters and thus a transitive closure is not needed. We therefore simplify the search for extra parameters by treating each strongly connected component instead of each function as a unit, thereby reducing the time complexity of lambda-lifting from \( \mathcal{O}(n^3 log n) \) to \( \mathcal{O}(n^2 log n) \), where n is the size of the program. Since a lambda-lifter can output programs of size \( \mathcal{O}(n^2 ) \), we believe that our algorithm is close to optimal.
Conference Paper
Full-text available
Lambda lifting is a technique for transforming a program with local function definitions into a program consi sting only of global function definitions. The best known lambda li fting algorithm computes the minimal set of extraneous parameters needed by each function in O(n3) steps by solving a system of set equations which are recursive if the functions in the program are mutually recursive. Mutually recursive functions give rise to strongly connected components in the call graph of a program. Danvy and Schultz observed that all functions in a strongly connected component can be given the same set of free variables as extraneous parameters. Based on this observation, they developed an O(n2) graph-based lambda lifting algorithm. This article illustrates how Danvy's and Schultz's algorit hm is an approximation of Johnsson's algorithm for a certain clas s of programs and describes an O(n3) graph-based lambda lifting algorithm that yields the same results as Johnsson's algori thm. I. I NTRODUCTION Lambda lifting is a technique for transforming a program with local function definitions into a program consisting on ly of global function definitions. This program transformatio n technique is important for restructuring functional progr ams written for the web(1), for partial evaluators(2), and for e ffi- cient compilation(3). The scoping rules of most programming languages prevent nested functions from directly being lifted
Conference Paper
Full-text available
Priced timed (game) automata extend timed (game) automata with costs on both locations and transitions. In this paper we focus on reachability priced timed game automata and prove that the optimal cost for winning such a game is computable under conditions concerning the non-zenoness of cost. Under stronger conditions (strictness of constraints) we prove that in case an optimal strategy exists, we can compute a state-based winning optimal strategy.
Conference Paper
Full-text available
For a system of polynomial equations over Q_p we present an efficient construction of a single polynomial of quite small degree whose zero set over Q_p coincides with the zero set over Q_p of the original system. We also show that the polynomial has some other attractive features such as low additive and straight-line complexity. The proof is based on a link established here between the above problem and some recent number theoretic result about zeros of p-adic forms.
Conference Paper
Full-text available
The concept of zero-knowledge (ZK) has become of fundamental importance in cryptography. However, in a setting where entities are modeled by quantum computers, classical arguments for proving ZK fail to hold since, in the quantum setting, the concept of rewinding is not generally applicable. Moreover, known classical techniques that avoid rewinding have various shortcomings in the quantum setting. We propose new techniques for building quantum zero-knowledge (QZK) protocols, which remain secure even under (active) quantum attacks. We obtain computational QZK proofs and perfect QZK arguments for any NP language in the common reference string model. This is based on a general method converting an important class of classical honest-verifier ZK (HVZK) proofs into QZK proofs. This leads to quite practical protocols if the underlying HVZK proof is efficient. These are the first proof protocols enjoying these properties, in particular the first to achieve perfect QZK. As part of our construction, we propose a general framework for building unconditionally hiding (trapdoor) string commitment schemes, secure against quantum attacks, as well as concrete instantiations based on specific (believed to be) hard problems. This is of independent interest, as these are the first unconditionally hiding string commitment schemes withstanding quantum attacks. Finally, we give a partial answer to the question whether QZK is possible in the plain model. We propose a new notion of QZK, non-oblivious verifier QZK, which is strictly stronger than honest-verifier QZK but weaker than full QZK, and we show that this notion can be achieved by means of efficient (quantum) protocols.
Conference Paper
Full-text available
The Standard ML of New Jersey compiler has been under development for five years now. We have developed a robust and complete environment for Standard ML that supports the implementation of large software systems and generates efficient code. The compiler has also served as a laboratory for de- veloping novel implementation techniques for a so- phisticated type and module system, continuation based code generation, efficient pattern matching, and concurrent programming features.
Conference Paper
Full-text available
Optimizing compilers for higher-order languages need not be terribly complex. The problems created by non-local, non-global variables can be eliminated by allocating all such variables in the heap. Lambda lifting makes this practical by eliminating all non-local variables except for those that would have to be allocated in the heap anyway. The eliminated non-local variables become local variables that can be allocated in registers. Since calls to known procedures are just gotos that pass arguments, lifted lambda expressions are just assembly language labels that have been augmented by a list of symbolic names for the registers that are live at that label.
Conference Paper
Full-text available
The construction of interactive server-side Web appli- cations differs substantially from the construction of tradi- tional interactive programs. In contrast, existing Web pro- gramming paradigms force programmers to save and re- store control state between user interactions. We present an automated transformationthat converts traditional interac- tive programs into standard CGI programs. This enables reuse of existing software developmentmethodologies. Fur- thermore, an adaptation of existing programming environ- ments supports the development of Web programs.
Article
Full-text available
One of the most attractive features of functional programming languages is their suitability for programming parallel computers. This paper is devoted to discussion of such a claim. Firstly, parallel functional programming is discussed from the programmer's point of view. Secondly, since most parallel functional language implementations are based on the concept of graph reduction, the issues raised by graph reduction are discussed. Finally, the paper concludes with a case study of a particular parallel graph reduction machine and a survey of other parallel architectures.
Article
Full-text available
We consider the problem of lightweight closure conversion, in which multiple procedure call protocols may coexist in the same code. A lightweight closure omits bindings for some of the free variables of the procedure that it represents. Flow analysis is used to match the protocol expected by each procedure and the protocol used at its possible call sites. We formulate the flow analysis as a deductive system that generates a labeled transition system and a set of constraints. We show that any solution to the constraints justifies the resulting transformation. Some of the techniques used are similar to those of abstract interpretation, but others appear to be novel.
Article
Full-text available
Abstract Lambda-lifting is a program transformation that is used in compilers, partial evaluators, and program transformers. In this article, we show how to reduce its complexity from cubic time to quadratic time, and we present a flow-sensitive lambda-lifter that also works in quadratic time. Lambda-lifting transforms a block-structured program into a set of recursive equations, one for each local function in the source program. Each equation carries extra parameters to account for the free variables of the corresponding local function and of all its callees. It is the search for these extra parameters that yields the cubic factor in the traditional formulation of lambda-lifting, which is due to Johnsson. This search is carried out by computing a transitive closure. To reduce the complexity of lambda-lifting, we partition the call graph of the source program into strongly connected components, based on the simple observation that all functions in each component need the same extra parameters and thus a transitive closure is not needed. We therefore simplify the search for extra parameters by treating each strongly connected component instead of each function
Article
We present a formal and general specification of lambda lifting and prove its correctness with respect to a call-by-name operational semantics. We use this specification to prove the correctness of a lambda lifting algorithm similar to the one proposed by Johnsson. Lambda lifting is a program transformation that eliminates free variables from functions by introducing additional formal parameters to function definitions and additional actual parameters to function calls. This operation supports the transformation from a lexically-structured functional program into a set of recursive equations. Existing results provide specific algorithms and only limited correctness results. Our work provides a more general specification of lambda lifting (and related operations) that supports flexible translation strategies, which may result in new implementation techniques. Our work also supports a simple framework in which the interaction of lambda lifting and other optimizations can be studied and from which new algorithms might be obtained.
Book
This new, expanded textbook describes all phases of a modern compiler: lexical analysis, parsing, abstract syntax, semantic actions, intermediate representations, instruction selection via tree matching, dataflow analysis, graph-coloring register allocation, and runtime systems. It includes good coverage of current techniques in code generation and register allocation, as well as functional and object-oriented languages, that are missing from most books. In addition, more advanced chapters are now included so that it can be used as the basis for a two-semester or graduate course. The most accepted and successful techniques are described in a concise way, rather than as an exhaustive catalog of every possible variant. Detailed descriptions of the interfaces between modules of a compiler are illustrated with actual C header files. The first part of the book, Fundamentals of Compilation, is suitable for a one-semester first course in compiler design. The second part, Advanced Topics, which includes the advanced chapters, covers the compilation of object-oriented and functional languages, garbage collection, loop optimizations, SSA form, loop scheduling, and optimization for cache-memory hierarchies.
Article
Self-applicable partial evaluation has been implemented for half a decade now, but many problems remain open. This paper addresses and solves the problems of automating call unfolding, having an open-ended set of operators, and processing global variables updated by side effects. The problems of computation duplication and termination of residual programs are addressed and solved: residual programs never duplicate computations of the source program; residual programs do not terminate more often than source programs. This paper describes the automatic autoprojector (self-applicable partial evaluator) Similix; it handles programs with user-defined primitive abstract data type operators which may process global variables. Abstract data types make it possible to hide actual representations of data and prevent specializing operators over these representations. The formally sound treatment of global variables makes Similix fit well in an applicative order programming environment. We present a new method for automatic call unfolding which is simpler, faster, and sometimes more effective than existing methods: it requires neither recursion analysis of the source program, nor call graph analysis of the residual program. To avoid duplicating computations and preserve termination properties, we introduce an abstract interpretation of the source program, abstract occurence counting analysis, which is performed during preprocessing. We express it formally and simplify it.
Article
In this thesis we present and analyse a set of automatic source-to-source programtransformations that are suitable for incorporation in optimising compilers for lazyfunctional languages. These transformations improve the quality of code in manydifferent respects, such as execution time and memory usage.The transformations presented are divided in two sets: global transformations, whichare performed once (or sometimes twice) during the compilation process; and a setof local...
Chapter
The presentations of type theory based on a comprehension scheme, a skolemized comprehension scheme and -calculus are equivalent, both in the sense that each one is a conservative extension of the previous and that each one can be coded in the previous preserving provability. A similar result holds for set theory.
Article
Lambda-lifting a block-structured program transforms it into a set of recursive equations. We present the symmetric transformation: lambda-dropping. Lambda-dropping a set of recursive equations restores block structure and lexical scope.For lack of block structure and lexical scope, recursive equations must carry around all the parameters that any of their callees might possibly need. Both lambda-lifting and lambda-dropping thus require one to compute Def/Use paths: •for lambda-lifting: each of the functions occurring in the path of a free variable is passed this variable as a parameter;•for lambda-dropping: parameters which are used in the same scope as their definition do not need to be passed along in their path. A program whose blocks have no free variables is scope-insensitive. Its blocks are then free to float (for lambda-lifting) or to sink (for lambda-dropping) along the vertices of the scope tree.To summarize:Our primary application is partial evaluation. Indeed, many partial evaluators for procedural programs operate on recursive equations. To this end, they lambda-lift source programs in a pre-processing phase. But often, partial evaluators [automatically] produce residual recursive equations with dozens of parameters, which most compilers do not handle efficiently. We solve this critical problem by lambda-dropping residual programs in a post-processing phase, which significantly improves both their compile time and their run time.To summarize: Lambda-lifting has been presented as an intermediate transformation in compilers for functional languages. We study lambda-lifting and lambda-dropping per se, though lambda-dropping also has a use as an intermediate transformation in a compiler: we noticed that lambda-dropping a program corresponds to transforming it into the functional representation of its optimal SSA form. This observation actually led us to substantially improve our PEPM’97 presentation of lambda-dropping.
Conference Paper
The process of -lifting (or bracket abstraction) translates expressions in a typed -calculus into expressions in a typed combinator language. This is of interest because it shows that the -calculus and the combinator language are equally expressive (as the translation from combinators to -expressions is rather trivial). This paper studies the similar problems for 2-level -calculi and 2-level combinator languages. The 2-level nature of the type system enforces a formal distinction between binding times, e.g. between computations at compile-time and computations at run-time. In this setting the natural formulations of 2-level -calculi and 2-level combinator languages turn out not to be equally expressive. The translation into 2-level -calculus is straight-forward but the 2-level -calculus is too powerful for -lifting to succeed. We then develop a restriction of the 2-level -calculus for which -lifting succeeds and that is as expressive as the 2-level combinator language.
Conference Paper
Starting from a continuation-based interpreter for a sim- ple logic programming language, propositional Prolog with cut, we de- rive the corresponding logic engine in the form of an abstract machine. The derivation originates in previous work (our article at PPDP 2003) where it was applied to the lambda-calculus. The key transformation here is Reynolds's defunctionalization that transforms a tail-recursive, continuation-passing interpreter into a transition system, i.e., an abstract machine. Similar denotational and operational semantics were studied by de Bruin and de Vink (their article at TAPSOFT 1989), and we compare their study with our derivation. Additionally, we present a direct-style interpreter of propositional Prolog expressed with control operators for delimited continuations.
Conference Paper
Without Abstract
Conference Paper
Lambda-lifting and lambda-dropping respectively transform a block-structured functional program into recursive equations and vice versa. Lambda-lifting was developed in the early 80’s, whereas lambda-dropping is more recent. Both are split into an analysis and a transformation. Published work, however, has only concentrated on the analysis parts. We focus here on the transformation parts and more precisely on their correctness, which appears never to have been proven. To this end, we define extensional versions of lambda-lifting and lambda-dropping and establish their correctness with respect to a least fixed-point semantics.
Conference Paper
This paper presents a logical framework for low-level machine code and code generation. We first define a calculus, called sequential sequent calculus, of intuitionistic propositional logic. A proof of the calculus only contains left rules and has a linear (non-branching) structure, which reflects the properties of sequential machine code. We then establish a Curry-Howard isomorphism between this proof system and machine code based on the following observation. An ordinary machine instruction corresponds to a polymorphic proof transformer that extends a given proof with one inference step. A return instruction, which turns a sequence of instructions into a program, corresponds to a logical axiom (an initial proof tree). Sequential execution of code corresponds to transforming a proof to a smaller one by successively eliminating the last inference step. This logical correspondence enables us to present and analyze various low-level implementation processes of a functional language within the logical framework. For example, a code generation algorithm for the lambda calculus is extracted from a proof of the equivalence theorem between the natural deduction and the sequential sequent calculus.
Conference Paper
One of the most attractive features of functional languages is first-class functions. To support first-class functions many functional languages create heap-allocated closures to store the bindings of free variables. This makes it difficult to predict how the heap is accessed and makes accesses to free variables slower than accesses to bound variables. This article presents the operational semantics of the MT evaluator virtual machine and proposes a new implementation strategy for first-class functions in a pure functional language that eliminates the use of heap-allocated closures by using partial evaluation and dynamic code generation. At runtime, functions containing references to free variables are specialized for the bindings of these variables. In these specialized functions, references to free variables become references to constant values. As a result, the need for heap allocated closures is eliminated, accesses to free variables become faster than accesses to bound variables, and expected heap access patterns remain unchanged.
Conference Paper
Graph reduction has emerged as a poverful implementation model for lazy functional languages, especially for parallel machines. Supercombinators and full laziness are two of the key techniques available for the efficient implementation of graph reduction, and the purpose of this paper is to provide an accessible introduction to these techniques.
Conference Paper
LML is a strongly typed, statically scoped functional Language with Lazy evaluation. It is compiled trough a number of program transformations which makes the code generation easier. Code is generated in two steps, first code for an abstract graph manipulation machine, the G-machine. From this code machine code is generated. Some benchmark tests are also presented.
Conference Paper
There is a growing interest nowadays in functional programming languages and systems, and in special hardware for executing them. Many of these implementations are based on a system called graph reduction (GR), in which a program is represented as a graph which is transformed, or reduced, by the machine until it represents the desired answer. The various graph reduction implementations differ in the structure of the “machine code” (the program graph) and the compilation algorithms necessary to produce it from a source language. This paper describes a new implementation method using super-combinators which is apparently more efficient than its predecessors. Consideration of the new method also helps clarify the relationships between several other graph-reduction schemes. This paper is necessarily brief, but a fuller account can be found in [Hughes]. The simplest machine language we shall consider consists of constants combined by function application. This is the language of constant applicative forms (cafs). Some of the constants are basic functions.
Conference Paper
We implemented a continuation-passing style (CPS) code generator for ML. Our CPS language is represented as an ML datatype in which all functions are named and most kinds of ill-formed expressions are impossible. We separate the code generation into phases that rewrite this representation into ever-simpler forms. Closures are represented explicitly as records, so that closure strategies can be communicated from one phase to another. No stack is used. Our benchmark data shows that the new method is an improvement over our previous, abstract-machine based code generator.
Book
The control and data flow of a program can be represented using continuations, a concept from denotational semantics that has practical application in real compilers. This book shows how continuation-passing style is used as an intermediate representation on which to perform optimisations and program transformations. Continuations can be used to compile most programming languages. The method is illustrated in a compiler for the programming language Standard ML. However, prior knowledge of ML is not necessary, as the author carefully explains each concept as it arises. This is the first book to show how concepts from the theory of programming languages can be applied to the producton of practical optimising compilers for modern languages like ML. This book will be essential reading for compiler writers in both industry and academe, as well as for students and researchers in programming language theory.
Article
Local CPS conversion is a compiler transformation for improving the code generated for nested loops by a direct-style compiler that uses recursive functions to represent loops. The transformation selectively applies CPS conversion at non-tail call sites, which allows the compiler to use a single machine procedure and stack frame for both the caller and callee. In this paper, we describe LCPS conversion, as well as a supporting analysis. We have implemented Local CPS conversion in the MOBY compiler and describe our implementation. In addition to improving the performance of loops, Local CPS conversion is also used to aid the optimization of non-local control flow by the MOBY compiler. We present results from preliminary experiments with our compiler that show significant reductions in loop overhead as a result of Local CPS conversion.
Article
. This paper describes a verified compiler for PreScheme, the implementation language for the vlisp run-time system. The compiler and proof were divided into three parts: A transformational front end that translates source text into a core language, a syntax-directed compiler that translates the core language into a combinator-based tree-manipulation language, and a linearizer that translates combinator code into code for an abstract stored-program machine with linear memory for both data and code. This factorization enabled different proof techniques to be used for the different phases of the compiler, and also allowed the generation of good code. Finally, the whole process was made possible by carefully defining the semantics of vlisp PreScheme rather than just adopting Scheme's. We believe that the architecture of the compiler and its correctness proof can easily be applied to compilers for languages other than PreScheme. Table of Contents 1 Introduction : : : : : : : : : : : : : :...