Conference Paper

A graph-based higher-order intermediate representation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Nevertheless, there is significant inflation in the IR code, where new constructions correspond to function abstractions. For instance, the code written for the Low-Level Virtual Machine (LLVM) occupies 600 lines, despite the simplicity of the example [9,10]. The analysis of this code was time-consuming, as would be the case for similar examples. ...
... As each variable is assigned once in the SSA representation and a φ function is added at each join point, it leads to an increase in the time required to analyze all instructions of the program. For instance, the LLVM code for the simple example comprises over 600 lines [9], which requires considerable time to analyze the source code for such examples. Therefore, removing some of the instructions that will not affect the program execution and decreasing the number of the φ functions in the SSA form would be solutions that would produce a minimal SSA form that can decrease the time taken to detect the feasible paths. ...
Article
Full-text available
Static analysis is one of the techniques used today to analyze source codes and minimize the issue of software vulnerability. Static analysis has the ability to observe all possible software paths in an application through the scrutiny of a web application’s source code. Among those paths, some may be considered feasible paths, which refer to any paths that the test cases can execute. The detection of feasible paths in the results of a static analysis helps to minimize the false positive rate. However, the detection of feasible paths can be challenging, especially for programs that have multiple conditions in the same branch. The aim is to ensure that each feasible path is detected only once (not duplicated). This paper proposes an approach based on minimal static single assignment (MSSA) form and symbolic execution to detect feasible paths. The proposed approach starts by converting the source code into an abstract syntax tree (AST), followed by converting the AST to minimal SSA representation, which helps to decrease the number of instructions in the SSA form. An algorithm was built to examine all of the instructions of the SSA form, identify whole paths in the source code, and extract constraints along each path. A path weight method (PWM) is proposed in this work to avoid detecting duplicated feasible paths. The satisfiability modulo theory (SMT) solver was used to check the satisfiability of each path condition. The proposed approach was tested on seven well-known test programs that have been used in related studies and 10 large scale programs. The experimental results indicate that the proposed method (PWM) can avoid detecting duplicated feasible paths, and the proposed approach reduced the time required for generating the paths compared to that in related studies.
... In this paper, we counter these problems by presenting a clean-slate design of a programming system called AnyDSL 1 that is built for writing high-performance code using PE. Figure 1 overviews AnyDSL's components and its compilation flow. AnyDSL extends the continuation-passing style (CPS)-based IR Thorin [Leißa et al. 2015b] with a simple online partial evaluator. Thorin's design is minimalistic in the sense that its only constructs are continuations and so-called primops (primitive operations, such as arithmetic operations, loads and stores, etc.). ...
... (1) the parameter is of first-order, (2) its continuation does not have other higher-order parameters (so it is a second-order continuation), and (3) its continuation does not have free variables (other than more continuations without free variables). This strategy transforms the program to control-flow form [Leißa et al. 2015b]: All residual continuations will either be global functions (i.e. continuations with a return continuation), or basic blocks (i.e. ...
Article
Full-text available
This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system exposes code generation for accelerators (vectorization/parallelization for CPUs and GPUs) via compiler-known higher-order functions that can be subjected to partial evaluation. This way, generic implementations can be instantiated with target-specific code at compile time. In our experimental evaluation we present three extensive case studies from image processing, ray tracing, and genome sequence alignment. We demonstrate that using partial evaluation, we obtain high-performance implementations for CPUs and GPUs from one language and one code base in a generic way. The performance of our codes is mostly within 10%, often closer to the performance of multi man-year, industry-grade, manually-optimized expert codes that are considered to be among the top contenders in their fields.
... Our work uses shallow embedding. But instead of extending the compiler of the host language, we use partial evaluation in order to eliminate higher-order functions at compile-time [23,24]. ...
... By default, the Impala compiler partially evaluates all calls to higher-order functions, which removes closures for non-recursive calls. Using a novel optimization called lambda mangling [24], the evaluator also eliminates closures for tail-recursive calls. ...
Conference Paper
Full-text available
In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even though they share most of their algorithmic core. The results are implementations that heavily mix algorithmic aspects with hardware and implementation details, making the code non-portable and difficult to change and maintain. In this paper, we present a new approach that offers the ability to define in a functional language a set of conceptual, high-level language abstractions that are optimized away by a special compiler in order to maximize performance. Using this abstraction mechanism we separate a generic ray traversal and intersection algorithm from its low-level aspects that are specific to the target hardware. We demonstrate that our code is not only significantly more flexible, simpler to write, and more concise but also that the compiled results perform as well as state-of-the-art implementations on any of the tested CPU and GPU platforms.
... TR is interpreted and partially evaluated using Dynamic Staging. By calling a built-in $compile, any function can be compiled, through Thorin [LKH15] and LLVM [LA04], into highly efficient machine code. ...
... It can also modify the parser of ManyDSL, so that different DSLs can be read. Finally, TR code can be translated to Thorin [LKH15] and then compiled to LLVM [LA04]. ...
Article
Full-text available
Domain-specific languages are becoming increasingly important. Almost every application touches multiple domains. But how to define, use, and combine multiple DSLs within the same application? The most common approach is to split the project along the domain boundaries into multiple pieces and files. Each file is then compiled separately. Alternatively, multiple languages can be embedded in a flexible host language: within the same syntax a new domain semantic is provided. In this paper we follow a less explored route of metamorphic languages. These languages are able to modify their own syntax and semantics on the fly, thus becoming a more flexible host for DSLs. Our language allows for dynamic creation of grammars and switching languages where needed. We achieve this through a novel concept of Syntax-Directed Execution. A language grammar includes semantic actions that are pieces of functional code executed immediately during parsing. By avoiding additional intermediate representation, connecting actions from different languages and domains is greatly simplified. Still, actions can generate highly specialized code though lambda encapsulation and Dynamic Staging.
... Thorin [19] is a higher-order, functional IR based on continuation-passing style. Thorin chooses to not use explicit nesting, and uses a dependency graph instead. ...
Preprint
Static Single Assignment (SSA) is the workhorse of modern optimizing compilers for imperative programming languages. However, functional languages have been slow to adopt SSA and prefer to use intermediate representations based on minimal lambda calculi due to SSA's inability to express higher order constructs. We exploit a new SSA construct -- regions -- in order to express functional optimizations via classical SSA based reasoning. Region optimization currently relies on ad-hoc analyses and transformations on imperative programs. These ad-hoc transformations are sufficient for imperative languages as regions are used in a limited fashion. In contrast, we use regions pervasively to model sub-expressions in our functional IR. This motivates us to systematize region optimizations. We extend classical SSA reasoning to regions for functional-style analyses and transformations. We implement a new SSA+regions based backend for LEAN4, a theorem prover that implements a purely functional, dependently typed programming language. Our backend is feature-complete and handles all constructs of LEAN4's functional intermediate representation {\lambda}rc within the SSA framework. We evaluate our proposed region optimizations by optimizing {\lambda}rc within an SSA+regions based framework implemented in MLIR and demonstrating performance parity with the current LEAN4 backend. We believe our work will pave the way for a unified optimization framework capable of representing, analyzing, and optimizing both functional and imperative languages.
... The MLIR [16] dialect linalg cites LIFT as a direct influence. Of course there exist other IRs that incorporate functional aspects, such as the functional graph-based Thorin [17] IR that provides explicit support for higher-order functions. Many machine learning compilers, such as TensorFlow [1] and PyTorch [20], follow a functional graph-based IR that is inspired by ideas from data-flow programming. ...
Preprint
Full-text available
The trend towards specialization of software and hardware - fuelled by the end of Moore's law and the still accelerating interest in domain-specific computing, such as machine learning - forces us to radically rethink our compiler designs. The era of a universal compiler framework built around a single one-size-fits-all intermediate representation (IR) is over. This realization has sparked the creation of the MLIR compiler framework that empowers compiler engineers to design and integrate IRs capturing specific abstractions. MLIR provides a generic framework for SSA-based IRs, but it doesn't help us to decide how we should design IRs that are easy to develop, to work with and to combine into working compilers. To address the challenge of IR design, we advocate for a language-oriented compiler design that understands IRs as formal programming languages and enforces their correct use via an accompanying type system. We argue that programming language techniques directly guide extensible IR designs and provide a formal framework to reason about transforming between multiple IRs. In this paper, we discuss the design of the Shine compiler that compiles the high-level functional pattern-based data-parallel language RISE via a hybrid functional-imperative intermediate language to C, OpenCL, and OpenMP. We compare our work directly with the closely related pattern-based Lift IR and compiler. We demonstrate that our language-oriented compiler design results in a more robust and predictable compiler that is extensible at various abstraction levels. Our experimental evaluation shows that this compiler design is able to generate high-performance GPU code.
... Hipacc [11], [12], HeteroHalide [7], PolyMage [8], the Merlin compiler [23], and AnyHLS [4] target C/C++ codes in their backend while applying optimization passes suitable for HLS tools such as Xilinx Vitis or Intel HLS compilers. AnyHLS makes use of the AnyDSL compiler [24] to partially evaluate [25], [5] higher-order functions in order to generate optimized OpenCL/C++ HLS codes. Dahlia [26] is similar to AnyHLS, in that it provides a general purpose language for generating specialized HLS C++ code with performance predictability. ...
Preprint
FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime libraries such as XRT attract software programmers into the reconfigurable domain. While software programmers are familiar with task-level and data-parallel programming, FPGAs often require different types of parallelism. For example, data-driven parallelism is mandatory to obtain satisfactory hardware designs for pipelined dataflow architectures. However, software programmers are often not acquainted with dataflow architectures - resulting in poor hardware designs. In this work we present FLOWER, a comprehensive compiler infrastructure that provides automatic canonical transformations for high-level synthesis from a domain-specific library. This allows programmers to focus on algorithm implementations rather than low-level optimizations for dataflow architectures. We show that FLOWER allows to synthesize efficient implementations for high-performance streaming applications targeting System-on-Chip and FPGA accelerator cards, in the context of image processing and computer vision.
... Hipacc [11], [12], HeteroHalide [7], PolyMage [8], the Merlin compiler [23], and AnyHLS [4] target C/C++ codes in their backend while applying optimization passes suitable for HLS tools such as Xilinx Vitis or Intel HLS compilers. AnyHLS makes use of the AnyDSL compiler [24] to partially evaluate [25], [5] higher-order functions in order to generate optimized OpenCL/C++ HLS codes. Daliha [26] is similar to AnyHLS, in that it provides a general purpose language for generating specialized HLS C++ code with performance predictability. ...
Conference Paper
Full-text available
FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime libraries such as XRT attract software programmers into the reconfigurable domain. While software programmers are familiar with task-level and data-parallel programming, FPGAs often require different types of parallelism. For example, data-driven parallelism is mandatory to obtain satisfactory hardware designs for pipelined dataflow architectures. However, software programmers are often not acquainted with dataflow architectures-resulting in poor hardware designs. In this work we present FLOWER, a comprehensive compiler infrastructure that provides automatic canonical transformations for high-level synthesis from a domain-specific library. This allows programmers to focus on algorithm implementations rather than low-level optimizations for dataflow architectures. We show that FLOWER allows to synthesize efficient implementations for high-performance streaming applications targeting System-on-Chip and FPGA accelerator cards, in the context of image processing and computer vision.
... In the compiler setting of Rizkallah et al. [2018], strong reduction corresponds to partial evaluation and serves as the basis for many program optimisations, see e.g. [Leißa et al. 2015]. We show how CBPV corresponds to CBV/CBN via a slight modification of the known translations and transport several results, such as strong normalisation, to CBV/CBN. ...
Conference Paper
Call-by-push-value (CBPV) is an idealised calculus for functional and imperative programming, introduced as a subsuming paradigm for both call-by-value (CBV) and call-by-name (CBN). We formalise weak and strong operational semantics for (effect-free) CBPV, define its equational theory, and verify adequacy for the standard set/algebra denotational semantics. Furthermore, we prove normalisation of the standard reduction, confluence of strong reduction, strong normalisation using Kripke logical relations, and soundness of the equational theory using logical equivalence. We adapt and verify the known translations from CBV and CBN into CBPV for strong reduction. This yields, for instance, proofs of strong normalisation and confluence for the full λ-calculus with sums and products. Thanks to the automation provided by Coq and the Autosubst 2 framework, there is little formalisation overhead compared to detailed paper proofs.
... Theano's graph representation in particular was based on the representations used by computer algebra systems (CAS), enabling aggressive algebraic simplification and pattern matching. An SSA 5 -based graph representation [12,25], sometimes referred to as sea-of-nodes, is used by the HotSpot Java compiler and the V8 TurboFan JavaScript compiler, and a graph representation using continuation-passing style (CPS, an IR commonly used in functional languages) called Thorin also exists [24]. ...
Preprint
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.
... La représentation intermédiaire Thorin [44] est une représentation intermédiaire fonctionnelle d'ordre supérieur, en particulier intéressante pour des transformations mettant en jeu des fonctions d'ordre supérieur et autres concepts fonctionnels comme la récursivité terminale. La représentation reprend les idées de la représentation fonctionnelle CPS [41] (Continuation Passing Style) qui est en étroite correspondance avec SSA. ...
Thesis
Les compilateurs optimisants pour les langages de programmation sont devenus des logiciels complexes et donc une source de bugs. Ceci peut être dangereux dans le contexte de systèmes critiques comme l'avionique ou la médecine. Cette thèse s'inscrit dans le cadre de la compilation vérifiée optimisante dont l'objectif est d'assurer l'absence de tels bugs. Plus précisément, nous étudions sémantiquement une représentation intermédiaire SSA (Single Static Assignment) particulière, Sea of Nodes, utilisée notamment dans le compilateur optimisant HotSpot pour Java. La propriété SSA a déjà été étudiée d'un point de vue sémantique sur des représentations simples sous forme de graphe de flot de contrôle, mais le sujet des dépendances entre instructions a seulement été effleuré depuis une perspective formelle. Cette thèse apporte une étude sémantique de transformations de programmes sous forme Sea of Nodes, intégrant la flexibilité en termes de dépendances de données entre instructions. En particulier, élimination de zero-checks redondants, propagation de constantes, retour au bloc de base séquentiel et destruction de SSA sont étudiés. Certains des sujets abordés, dont la formalisation d'une sémantique pour Sea of Nodes, sont accompagnés d'une vérification à l'aide de l'assistant de preuve Coq.
... We don't have to choose between flat and higher-order, however. Thorin [23] is a graph-based representation aiming to support both imperative and functional code by combining a flat structure for ease of code transformation and first-class closures for implementing higher-order languages. However, Thorin is still intended for use in strict languages with pervasive side effects; it remains to be seen whether such a representation could be adapted for high-level optimizations in a non-strict regime such as Haskell. ...
Article
Full-text available
The λ-calculus is popular as an intermediate language for practical compilers. But in the world of logic it has a lesser-known twin, born at the same time, called the sequent calculus. Perhaps that would make for a good intermediate language, too? To explore this question we designed Sequent Core, a practically-oriented core calculus based on the sequent calculus, and used it to re-implement a substantial chunk of the Glasgow Haskell Compiler.
... We don't have to choose between flat and higher-order, however. Thorin [23] is a graph-based representation aiming to support both imperative and functional code by combining a flat structure for ease of code transformation and first-class closures for implementing higher-order languages. However, Thorin is still intended for use in strict languages with pervasive side effects; it remains to be seen whether such a representation could be adapted for high-level optimizations in a non-strict regime such as Haskell. ...
Conference Paper
Full-text available
The λ-calculus is popular as an intermediate language for practical compilers. But in the world of logic it has a lesser-known twin, born at the same time, called the sequent calculus. Perhaps that would make for a good intermediate language, too? To explore this question we designed Sequent Core, a practically-oriented core calculus based on the sequent calculus, and used it to re-implement a substantial chunk of the Glasgow Haskell Compiler.
Article
Transformation of programs into continuation-passing style (CPS) reveals the notion of continuations, enabling many applications such as control operators and intermediate representations in compilers. Although type preservation makes CPS transformation more beneficial, achieving type-preserving CPS transformation for implicit polymorphism with call-by-value (CBV) semantics is known to be challenging. We identify the difficulty in the problem that we call scope intrusion. To address this problem, we propose a new CPS target language Λ open that supports two additional constructs for polymorphism: one only binds and the other only generalizes type variables. Unfortunately, their unrestricted use makes Λ open unsafe due to undesired generalization of type variables. We thus equip Λ open with affine types to allow only the type-safe generalization. We then define a CPS transformation from Curry-style CBV System F to type-safe Λ open and prove that the transformation is meaning and type preserving. We also study parametricity of Λ open as it is a fundamental property of polymorphic languages and plays a key role in applications of CPS transformation. To establish parametricity, we construct a parametric, step-indexed Kripke logical relation for Λ open and prove that it satisfies the Fundamental Property as well as soundness with respect to contextual equivalence.
Research
Full-text available
SANTOS, Roberto Carlos dos. RecPy: precompiler for studying the conversion of recursive functions. 2021. 124 pgs. Monography in Course Conclusion Paper - Bachelor of Computer Science - Institute of Mathematics and Statistics, State University of Rio de Janeiro, Rio de Janeiro, 2021. This work treats the basic concepts related to recursive functions. It describes the situations in which the conversion, among itself, between caudal recursive algorithms, non-caudal recursives and iteratives may be usefull or necessary in respect to the legibility or the efficience of the related codes. It presents the application RecPy, a pre-compiler made for the automatization of these convertions, for the reconizable recursive function’s types. It includes practical examples of convertions executed in the applicative, beyond graphical results obtained. Keywords: Recursion. Caudal recursion. Tail recursion. Recursive function. Iterative. Elimination of recursion. Recursion optimization. Conversion from recursion. Pre-compiler. Efficient algorithms. Concise codes. RecPy.
Article
In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even though they share most of their algorithmic core. The results are implementations that heavily mix algorithmic aspects with hardware and implementation details, making the code non-portable and difficult to change and maintain. In this paper, we present a new approach that offers the ability to define in a functional language a set of conceptual, high-level language abstractions that are optimized away by a special compiler in order to maximize performance. Using this abstraction mechanism we separate a generic ray traversal and intersection algorithm from its low-level aspects that are specific to the target hardware. We demonstrate that our code is not only significantly more flexible, simpler to write, and more concise but also that the compiled results perform as well as state-of-the-art implementations on any of the tested CPU and GPU platforms.
Article
This paper investigates shallow embedding of DSLs by means of online partial evaluation. To this end, we present a novel online partial evaluator for continuation-passing style languages. We argue that it has, in contrast to prior work, a predictable termination policy that works well in practice. We present our approach formally using a continuation-passing variant of PCF and prove its termination properties. We evaluate our technique experimentally in the field of visual and high-performance computing and show that our evaluator produces highly specialized and efficient code for CPUs as well as GPUs that matches the performance of hand-tuned expert code.
Conference Paper
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in non-portable solutions that need to be re-optimized for every new device. Achieving performance portability is the holy grail of high-performance computing and has so far remained an open problem even for well studied applications like matrix multiplication. We argue that what is needed is a way to describe applications at a high-level without committing to particular implementations. To this end, we developed in a previous paper a functional data-parallel language which allows applications to be expressed in a device neutral way. We use a set of well-defined rewrite rules to automatically transform programs into semantically equivalent device-specific forms, from which OpenCL code is generated. In this paper, we demonstrate how this approach produces high-performance OpenCL code for GPUs with a well-studied, well-understood application: matrix multiplication. Starting from a single high-level program, our compiler automatically generate highly optimized and specialized implementations. We group simple rewrite rules into more complex macro-rules, each describing a well-known optimization like tiling and register blocking in a composable way. Using an exploration strategy our compiler automatically generates 50,000 OpenCL kernels, each providing a differently optimized -- but provably correct -- implementation of matrix multiplication. The automatically generated code offers competitive performance compared to the manually tuned MAGMA library implementations of matrix multiplication on Nvidia and even outperforms AMD's clBLAS library.
Conference Paper
This paper investigates shallow embedding of DSLs by means of online partial evaluation. To this end, we present a novel online partial evaluator for continuation-passing style languages. We argue that it has, in contrast to prior work, a predictable termination policy that works well in practice. We present our approach formally using a continuation-passing variant of PCF and prove its termination properties. We evaluate our technique experimentally in the field of visual and high-performance computing and show that our evaluator produces highly specialized and efficient code for CPUs as well as GPUs that matches the performance of hand-tuned expert code.
Article
Full-text available
We present a survey of control-flow analysis of functional programs, which has been the subject of extensive investigation throughout the past 30 years. Analyses of the control flow of functional programs have been formulated in multiple settings and have led to many different approximations, starting with the seminal works of Jones, Shivers, and Sestoft. In this article, we survey control-flow analysis of functional programs by structuring the multitude of formulations and approximations and comparing them.
Article
Full-text available
We present our compiler intermediate representation Firm. Programs are always in SSA-form enabling a representation as graphs. We argue that this naturally encodes context information simplifying many analyses and optimizations. Instructions are connected by dependency edges relaxing the total to a partial order inside a basic block. For example alias analysis results can be directly encoded in the graph structure. The paper gives an overview of the representation and focuses on its construction. We present a simple construction algorithm which does not depend on dominance frontiers or a dominance tree. We prove that for reducible programs it produces a program in pruned and minimal SSA-form. The algorithm works incrementally so optimizations like copy propagation and constant folding can be performed on-the-fly during the construction.
Conference Paper
Full-text available
We present a simple SSA construction algorithm, which allows direct translation from an abstract syntax tree or bytecode into an SSA-based intermediate representation. The algorithm requires no prior analysis and ensures that even during construction the intermediate representation is in SSA form. This allows the application of SSA-based optimizations during construction. After completion, the intermediate representation is in minimal and pruned SSA form. In spite of its simplicity, the runtime of our algorithm is on par with Cytron et al.’s algorithm.
Article
Full-text available
An abstract is not available.
Conference Paper
Full-text available
The divide-and-conquer pattern of parallelism is a powerful approach to organize parallelism on problems that are expressed naturally in a recursive way. In fact, recent tools such as Intel Threading Building Blocks (TBB), which has received much attention, go further and make extensive usage of this pattern to parallelize problems that other approaches parallelize following other strategies. In this paper we discuss the limitations to express divide-and-conquer parallelism with the algorithm templates provided by the TBB. Based on our observations, we propose a new algorithm template implemented on top of TBB that improves the programmability of many problems that fit this pattern, while providing a similar performance. This is demonstrated with a comparison both in terms of performance and programmability.
Conference Paper
Full-text available
The Java HotSpot TM Server Compiler achieves improved asymptotic performance through a combination of ob- ject-oriented and classical-compiler optimizations. Aggressive inlining using class-hierarchy analysis reduces function call overhead and provides opportunities for many compiler optimizations.
Article
Full-text available
Higher-order languages such as Haskell encourage the programmer to build abstractions by composing functions. A good compiler must inline many of these calls to recover an efficiently executable program. In principle, inlining is dead simple: just replace the call of a function by an instance of its body. But any compiler-writer will tell you that inlining is a black art, full of delicate compromises that work together to give good performance without unnecessary code bloat. The purpose of this paper is, therefore, to articulate the key lessons we learned from a full-scale inliner, the one used in the Glasgow Haskell compiler. We focus mainly on the algorithmic aspects, but we also provide some indicative measurements to substantiate the importance of various aspects of the inliner.
Article
Full-text available
An optimizing compiler for scheme called Orbit that incorporates with scheme called TC, together with Steele's Rabbit compilers is discussed. The combination of lexical scoping, full closures and first-class continuation creates a unique and challenging task of compiler designer. It has been found that the general CPS approach induces a particular style of compiler writing that has many benefits. It has been also observed that the compiler writing not identified by Steele and results in a compiler that not only has simple and modulator organization but also generates very efficient code.
Article
Full-text available
In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single assignment form and the control dependence graph have been proposed to represent data flow and control flow propertiee of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimization. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use dominance frontiers, a new concept that may have other applications. We also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program. This paper thus presents strong evidence that these structures can be of practical use in optimization.
Article
Full-text available
In order to simplify the compilation process, many compilers for higher-order languages use the continuationpassing style (CPS) transformation in a first phase to generate an intermediate representation of the source program. The salient aspect of this intermediate form is that all procedures take an argument that represents the rest of the computation (the "continuation"). Since the naive CPS transformation considerably increases the size of programs, CPS compilers perform reductions to produce a more compact intermediate representation. Although often implemented as a part of the CPS transformation, this step is conceptually a second phase. Finally, code generators for typical CPS compilers treat continuations specially in order to optimize the interpretation of continuation parameters. A thorough analysis of the abstract machine for CPS terms shows that the actions of the code generator invert the naive CPS translation step. Put differently, the combined effect of the three phases is equivalent to a source-to-source transformation that simulates the compaction phase. Thus, fully developed CPS compilers do not need to employ the CPS transformation but can achieve the same results with a simple source-level transformation. 1 Compiling with Continuations A number of prominent compilers for applicative higherorder programming languages use the language of continuation-passing style (CPS) terms as their intermediate representation for programs [2, 14, 18, 19]. This strategy apparently offers two major advantages. First, Plotkin [16] showed that the -value calculus based on
Article
In order to simplify the compilation process, many compilers for higher-order languages use the continuation-passing style 1993 transformation in a first phase to generate an intermediate representation of the source program. The salient aspect of this intermediate form is that all procedures take an argument that represents the rest of the computation (the “continuation”). Since the naiüve CPS transformation considerably increases the size of programs, CPS compilers perform reductions to produce a more compact intermediate representation. Although often implemented as a part of the CPS transformation, this step is conceptually a second phase. Finally, code generators for typical CPS compilers treat continuations specially in order to optimize the interpretation of continuation parameters. A thorough analysis of the abstract machine for CPS terms show that the actions of the code generator invert the naiüve CPS translation step. Put differently, the combined effect of the three phases is equivalent to a source-to-source transformation that simulates the compaction phase. Thus, fully developed CPS compilers do not need to employ the CPS transformation but can achieve the same results with a simple source-level transformation.
Article
A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: Compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those peculiarities to the implementation. A popular method to avoid these problems is to implement the algorithm in a Domain-Specific Language (DSL). A DSL compiler can then automatically tune the code for the target platform. In this article we show how to embed a DSL for stencil codes in another language. In contrast to prior approaches we only use a single language for this task which offers explicit control over code refinement. This is used to specialize stencils for particular scenarios. Our results show that our specialized programs achieve competitive performance compared to hand-tuned CUDA programs while maintaining a convenient coding experience.
Article
In this thesis we present and analyse a set of automatic source-to-source programtransformations that are suitable for incorporation in optimising compilers for lazyfunctional languages. These transformations improve the quality of code in manydifferent respects, such as execution time and memory usage.The transformations presented are divided in two sets: global transformations, whichare performed once (or sometimes twice) during the compilation process; and a setof local...
Article
We have developed a compiler for the lexically-scoped dialect of LISP known as SCHEME. The compiler knows relatively little about specific data manipulation primitives such as arithmetic operators, but concentrates on general issues of environment and control. Rather than having specialized knowledge about a large variety of control and environment constructs, the compiler handles only a small basis set which reflects the semantics of lambda-calculus. All of the traditional imperative constructs, such as sequencing, assignment, looping, GO TO, as well as many standard LISP constructs such as AND, OR and COND, are expressed as macros in terms of the applicative basis set. A small number of optimization techniques, coupled with the treatment of function calls as GO TO statements, serves to produce code as good as that produced by more traditional compilers.
Article
From the Publisher:Introduction to Programming using SML provides a thorough introduction to the principles of programming and program design using the Standard ML programming language. The emphasis throughout is to p of programming into practice. The examples and exercises teach the student how to apply basic theoretical concepts to produce succinct and elegant programs and program designs. Coverage includes an introduction to fundamental data structures and their applications. The notions of binding, environment, store, closure and evaluation are introduced in order to explain the meaning of programs in an informal but precise way. Thus, the authors provide the reader with a set of durable programming concepts which will exist well into the next generation of programming languages. Features of the book include: Attractive and reader-friendly presentation Clear and careful explanations A rich collection of programming problems and a wide variety of examples Coverage of modelling and abstraction using data structures and the SML module system Overview and statement of objectives at the start of each chapter An introduction to producing technical documentation based on the SML module system Extensive material in the appendices covering the SML language and module system and selected parts of the SML basis library Accompanying Web Site supporting the book, containing all the program code, further teaching material and links to SML systems and other useful resources
Article
We describe a new program-analysis framework, based on CPS and procedure-string abstractions, that can handle critical analyses which the k-CFA framework cannot. We present the main theorems concerning correctness, show an application analysis, and describe a running implementation.
Article
Lambda-lifting a block-structured program transforms it into a set of recursive equations. We present the symmetric transformation: lambda-dropping. Lambda-dropping a set of recursive equations restores block structure and lexical scope.For lack of block structure and lexical scope, recursive equations must carry around all the parameters that any of their callees might possibly need. Both lambda-lifting and lambda-dropping thus require one to compute Def/Use paths: •for lambda-lifting: each of the functions occurring in the path of a free variable is passed this variable as a parameter;•for lambda-dropping: parameters which are used in the same scope as their definition do not need to be passed along in their path. A program whose blocks have no free variables is scope-insensitive. Its blocks are then free to float (for lambda-lifting) or to sink (for lambda-dropping) along the vertices of the scope tree.To summarize:Our primary application is partial evaluation. Indeed, many partial evaluators for procedural programs operate on recursive equations. To this end, they lambda-lift source programs in a pre-processing phase. But often, partial evaluators [automatically] produce residual recursive equations with dozens of parameters, which most compilers do not handle efficiently. We solve this critical problem by lambda-dropping residual programs in a post-processing phase, which significantly improves both their compile time and their run time.To summarize: Lambda-lifting has been presented as an intermediate transformation in compilers for functional languages. We study lambda-lifting and lambda-dropping per se, though lambda-dropping also has a use as an intermediate transformation in a compiler: we noticed that lambda-dropping a program corresponds to transforming it into the functional representation of its optimal SSA form. This observation actually led us to substantially improve our PEPM’97 presentation of lambda-dropping.
Conference Paper
We present a series of CPS-based intermediate languages suitable for functional language compilation, arguing that they have practical benefits over direct-style languages based on A -normal form (ANF) or monads. Inlining of functions demonstrates the benefits most clearly: in ANF-based languages, inlining involves a re-normalization step that rearranges let expressions and possibly introduces a new 'join point' function, and in monadic languages, commuting conversions must be applied; in contrast, inlining in our CPS language is a simple substitution of variables for variables. We present a contification transformation implemented by simple rewrites on the intermediate language. Exceptions are modelled using so-called 'double-barrelled' CPS. Subtyping on exception constructors then gives a very straightforward effect analysis for exceptions. We also show how a graph-based representation of CPS terms can be implemented extremely efficiently, with linear-time term simplification.
Article
This paper forms the substance of a course of lectures given at the International Summer School in Computer Programming at Copenhagen in August, 1967. The lectures were originally given from notes and the paper was written after the course was finished. In spite of this, and only partly because of the shortage of time, the paper still retains many of the shortcomings of a lecture course. The chief of these are an uncertainty of aim—it is never quite clear what sort of audience there will be for such lectures—and an associated switching from formal to informal modes of presentation which may well be less acceptable in print than it is natural in the lecture room. For these (and other) faults, I apologise to the reader. There are numerous references throughout the course to CPL [1–3]. This is a programming language which has been under development since 1962 at Cambridge and London and Oxford. It has served as a vehicle for research into both programming languages and the design of compilers. Partial implementations exist at Cambridge and London. The language is still evolving so that there is no definitive manual available yet. We hope to reach another resting point in its evolution quite soon and to produce a compiler and reference manuals for this version. The compiler will probably be written in such a way that it is relatively easyto transfer it to another machine, and in the first instance we hope to establish it on three or four machines more or less at the same time. The lack of a precise formulation for CPL should not cause much difficulty in this course, as we are primarily concerned with the ideas and concepts involved rather than with their precise representation in a programming language.
Article
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires priorspecific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1(212)869-0481, or permissions@acm.org.SIGPLANACMFunctional Programmingi 3 (i 2 )j 3 (j 2 )k 3 (k 2 )if j 3 < 203i 4 (i 2 )j 4 (j 2 )k 4 (k 2 )return j 44i 7 (i 5 , i 6 )j 7 (j 8 , j 9 )k 7 (k 8 ,k 9 )71i 1 1j 1 1k 1 02i 2 (i 7 , i 1...
Article
Programs written in powerful, higher-order languages like Scheme, ML, and Common Lisp should run as fast as their FORTRAN and C counterparts. They should, but they don't. A major reason is the level of optimisation applied to these two classes of languages. Many FORTRAN and C compilers employ an arsenal of sophisticated global optimisations that depend upon data-flow analysis: common-subexpression elimination, loop-invariant detection, induction-variable elimination, and many, many more. Compilers for higherorder languages do not provide these optimisations. Without them, Scheme, LISP and ML compilers are doomed to produce code that runs slower than their FORTRAN and C counterparts.
Conference Paper
Lambda lifting is a technique for transforming a functional program with local function definitions, possibly with free variables in the function definitions, into a program consisting only of global function (combinator) definitions which will be used as rewrite rules. Different ways of doing lambda lifting are presented, as well as reasons for rejecting or selecting the method used in our Lazy ML compiler. An attribute grammar and a functional program implementing the chosen algorithm is given. Originally publised in Proceedings 1985 Conference on Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Nancy, France, 1985. Springer Verlag. y As part B of author's thesis. Main addition: the attribute grammar formulation. 1 Introduction When compiling a lazy functional language using the technique described in [Joh84] it is presumed that the input program is in the form of a set of function definitions, possibly mutually recursive, tog...
Article
In this thesis we present and analyse a set of automatic source-to-source program transformations that are suitable for incorporation in optimising compilers for lazy functional languages. These transformations improve the quality of code in many different respects, such as execution time and memory usage. The transformations presented are divided in two sets: global transformations, which are performed once (or sometimes twice) during the compilation process; and a set of local transformations, which are performed before and after each of the global transformations, so that they can simplify the code before applying the global transformations and also take advantage of them afterwards. Many of the local transformations are simple, well known, and do not have major effects on their own. They become important as they interact with each other and with global transformations, sometimes in non-obvious ways. We present how and why they improve the code, and perform extensive experiments wit...
Article
We define syntactic transformations that convert continuation passing style (CPS) programs into static single assignment form (SSA) and vice versa. Some CPS programs cannot be converted to SSA, but these are not produced by the usual CPS transformation. The CPS!SSA transformation is especially helpful for compiling functional programs. Many optimizations that normally require flow analysis can be performed directly on functional CPS programs by viewing them as SSA programs. We also present a simple program transformation that merges CPS procedures together and by doing so greatly increases the scope of the SSA flow information. This transformation is useful for analyzing loops expressed as recursive procedures. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by per...
Article
The lambda-calculus is considered an useful mathematical tool in the study of programming languages, since programs can be identified with lambda-terms. However, if one goes further and uses beta eta-conversion to prove equivalence of programs, then a gross simplification is introduced (programs are identified with total functions from values to values), that may jeopardise the applicability of theoretical results. In this paper we introduce calculi based on a categorical semantics for computations, that provide a correct basis for proving equivalence of programs, for a wide range of notions of computation.
Article
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n ) time.
Article
In recent years, the trend in program representations for imperative programs has been to make them more functional, or to make them more sparse. However, new sparse representations have been non-functional, and new functional representations have not been sparse in the presence of pointer operations. In this paper, we present a functional representation that is sparse even in the presence of pointer operations. Conventionally, a store is represented in a functional program representation by a single object—typically a mapping from locations to values. We show how such a store object may be fragmented into several objects, each representing part of the store. The result is a sparser representation, which has not only the usual benefit of directly linking producers to consumers, but which also for static program analysis often leads to smaller domains of abstract values for store objects. Store fragmentation corresponds to assignment factored SSA form (a factorization of SSA form introduced in this paper). We report on experiments with a thorough fragmentation based on a data flow points-to analysis and an intermediate level fragmentation based on an almost linear time complexity points-to analysis by type inference.
Lambda-dropping: Transforming recursive equations into programs with block structure This is an extended version of Danvy's and Schultz' original paper of the same title which appeared at PEPM'97
  • O Danvy
  • U P Schultz
Elements of Software Science (Operating and Programming Systems Series)
  • M H Halstead