# Xavier LeroyCollège de France

Xavier Leroy

Doctor of Philosophy

## About

147

Publications

13,960

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

8,107

Citations

Citations since 2017

Introduction

I'm a senior computer scientist interested in all scientific aspects of computer programming, with special emphasis on functional programming languages and formal software verification. I hold the chair of Software Sciences at Collège de France. I am the primary author of the OCaml functional programming language and the CompCert formally-verified C compiler.

## Publications

Publications (147)

The polyhedral model is a high-level intermediate representation for loop nests that supports elegantly a great many loop optimizations. In a compiler, after polyhedral loop optimizations have been performed, it is necessary and difficult to regenerate sequential or parallel loop nests before continuing compilation. This paper reports on the formal...

CompCert is the first commercially available optimizing compiler that is formally verified, using machine-assisted mathematical proofs, to be exempt from mis-compilation. The executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This article gives an overview of the use of CompCert to gain ce...

The correct compilation of block diagram languages like Lustre, Scade, and a discrete subset of Simulink is important since they are used to program critical embedded control software. We describe the specification and verification in an Interactive Theorem Prover of a compilation chain that treats the key aspects of Lustre: sampling, nodes, and de...

The correct compilation of block diagram languages like Lustre, Scade, and a discrete subset of Simulink is important since they are used to program critical embedded control software. We describe the specification and verification in an Interactive Theorem Prover of a compilation chain that treats the key aspects of Lustre: sampling, nodes, and de...

CompCert is the first commercially available optimizing compiler that is for-mally verified, using machine-assisted mathematical proofs, to be free from miscompilation. The executable code it produces is proved to behave exactly as specified by the semantics of the source C program. CompCert's intended use is the compilation of safety-critical and...

CompCert is the first commercially available optimizing compiler that is formally verified, using machine-assisted mathematical proofs, to be exempt from mis-compilation. The executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This article gives an overview of the design of CompCert and its...

Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing. Unfortunately, this is not sufficient to guarantee the final result of a program, as several other actors are in...

This paper reports on the design and soundness proof, using the Coq proof assistant, of Verasco, a static analyzer based on abstract interpretation for most of the ISO C 1999 language (excluding recursion and dynamic allocation). Verasco establishes the absence of run-time errors in the analyzed programs. It enjoys a modular architecture that suppo...

This manual documents the release 4.02 of the OCaml system. It is organized as follows. Part I, "An introduction to OCaml", gives an overview of the language. Part II, "The OCaml language", is the reference description of the language. Part III, "The OCaml tools", documents the compilers, toplevel system, and programming utilities. Part IV, "The OC...

This document is the user’s manual for the CompCert C verified compiler. It is organized as follows: Chapter 1 gives an overview of the CompCert C compiler and of the formal verification of compilers. Chapter 2 explains how to install CompCert C. Chapter 3 explains how to use the CompCert C compiler. Chapter 4 explains how to use the CompCert C ref...

Tool-assisted verification of critical software has great potential but is limited by two risks: unsoundness of the verification tools, and miscompilation when generating executable code from the sources that were verified. A radical solution to these two risks is the deductive verification of compilers and verification tools themselves. In this in...

We discuss the difference between a formal semantics of the C standard, and a formal semantics of an implementation of C that satisfies the C standard. In this context we extend the CompCert semantics with end-of-array pointers and the possibility to byte-wise copy objects. This is a first and necessary step towards proving that the CompCert semant...

Separation Logic is the twenty-first-century variant of Hoare Logic that permits verification of pointer-manipulating programs. This book covers practical and theoretical aspects of Separation Logic at a level accessible to beginning graduate students interested in software verification. On the practical side it offers an introduction to verificati...

Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing. Unfortunately, this is not sufficient to guarantee the final result of a program, as several other actors are in...

This paper reports on the formalization and proof of soundness, using the Coq proof assistant, of an alias analysis: a static analysis that approximates the flow of pointer values. The alias analysis considered is of the points-to kind and is intraprocedural, flow-sensitive, field-sensitive, and untyped. Its soundness proof follows the general styl...

The formal verification of compilers and related programming tools depends crucially on the availability of appropriate mechanized semantics for the source, intermediate and target languages. In this invited talk, I review various forms of operational semantics and their mechanization, based on my experience with the formal verification of the Comp...

We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast
to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be
done without binders or alpha-conversion. We propose specific criteria for evaluating the utilit...

A memory model is an important component of the formal semantics of imperative programming languages: it specifies the behavior of operations over memory states, such as reads and writes. The formally verified CompCert C compiler uses a sophisticated memory model that is shared between the semantics of its source language (the CompCert subset of C)...

An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser provides the correc...

We present a formal operational semantics and its Coq mechanization for the C++ object model, featuring object construction and destruction, shared and repeated multiple inheritance, and virtual function call dispatch. These are key C++ language features for high-level system programming, in particular for predictable and reliable resource manageme...

The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.twitterFollow us on Twitter ...

The 14th ACM SIGPLAN International Conference on Functional Programming (ICFP) took place on August 31-September 2, 2009 in Edinburgh, Scotland; Andrew Tolmach chaired the program committee. Following the conference, the authors of selected papers were invited to submit extended versions for this special issue of JFP. After review and revision, fou...

Given the complexity and sophistication of code generation and optimization algorithms, and the difficulty of systematically testing a compiler, it is unsurprising that bugs occur in compilers and cause miscompilation: incorrect executable code is silently generated from a correct source program. The formal verification of a compiler is a radical s...

This work presents an evaluation of the CompCert formally specified and verified optimizing compiler for the development of DO-178 level A flight control software. First, some fundamental characteristics of flight control software are presented and the case study program is described. Then, the use of CompCert is justified: its main point is to all...

The formal verification of programs have progressed tremendously in the last decade. Principled but once academic approaches such as Hoare logic and abstract interpretation finally gave birth to quality verification tools, operating over source code (and not just idealized models thereof) and able to verify complex real-world applications [6, 8, 15...

Object layout - the concrete in-memory representation of objects - raises many delicate issues in the case of the C++ language, owing in particular to multiple inheritance, C compatibility and separate compilation. This paper formalizes a family of C++ object layout schemes and mechanically proves their correctness against the operational semantics...

The goal of this lecture is to show how modern theorem provers---in this case, the Coq proof assistant---can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and over generic program transformations, as typically found in compilers. The topics covered include: operational se...

Traduttore, tradittore (traducteur, traître), dit l'adage italien. Si l'on accepte le parallèle entre traduction et compilation de programme, celle-ci consistant à traduire un langage évolué en langage machine, mieux vaut faire mentir l'adage ! Portrait du premier compilateur en voie d'être garanti « zéro faute ».

Following the translation validation approach to high-assurance compilation, we describe a new algorithm for validating a posteriori the results of a run of register allocation. The algorithm is based on backward dataflow inference of equations between variables, registers and stack locations, and can cope with sophisticated forms of spilling and l...

Software pipelining is a loop optimization that overlaps the execution of several iterations of a loop to expose more instruction-level parallelism. It can result in first-class performances characteristics, but at the cost of significant obfuscation of the code, making this optimization difficult to test and debug. In this paper, we present a tran...

The goal of this lecture is to show how modern theorem provers – in this case, the Coq proof assistant – can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and over generic program transformations, as typically found in compilers. The topics covered include: operational se...

Function uncurrying is an important optimization for the efficient execution of functional programming languages. This optimization
replaces curried functions by uncurried, multiple-argument functions, while preserving the ability to evaluate partial applications.
First-order uncurrying (where curried functions are optimized only in the static scop...

This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context...

Translation validation establishes a posteriori the correctness of a run of a compilation pass or other program transformation. In this paper, we develop an efficient translation validation algorithm for the Lazy Code Motion (LCM) optimization. LCM is an interesting challenge for validation because it is a global optimization that moves code across...

This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of...

This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than previous methods. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the t...

Using a call-by-value functional language as an example, this article illustrates the use of coinductive definitions and proofs in big-step operational semantics, enabling it to describe diverging evaluations in addition to terminating evaluations. We formalize the connections between the coinductive big-step semantics and the standard small-step s...

This article presents the formal semantics of a large subset of the C language called Clight. Clight includes pointer arithmetic, "struct" and "union" types, C loops and structured "switch" statements. Clight is the source language of the CompCert verified compiler. The formal semantics of Clight is a big-step operational semantics that observes bo...

This article presents the formal verification, using the Coq proof assistant, of a memory model for low-level imperative languages such as C and compiler intermediate languages. Beyond giving semantics to pointer-based programs, this model supports reasoning over transformations of such programs. We show how the properties of the memory model are u...

This article describes the formal verification of a compilation algorithm that transforms parallel moves (parallel assignments
between variables) into a semantically-equivalent sequence of elementary moves. Two different specifications of the algorithm
are given: an inductive specification and a functional one, each with its correctness proofs. A f...

Translation validation consists of transforming a program and a posteriori validating it in order to detect a modification of its semantics. This approach can be used in a verified compiler, provided that validation is formally proved to be correct. We present two such validators and their Coq proofs of correctness. The validators are designed for...

Transformation to continuation-passing style (CPS) is often performed by optimizing compilers for functional programming languages. As part of the development and proof of correctness of a compiler for the mini-ML functional language, we have mechanically verified the correctness of two CPS transformations for a call-by-value lambda-calculus with n...

The first part of this talk will show some examples of uses of Caml in industrial contexts, especially at companies that are part of the Caml consortium. The second part discusses my personal experience at the Trusted Logic start-up company, developing high-security software components for smart cards. While technological limitations prevent runnin...

Programmers naturally expect that compilers and other code generation tools produce executable code that behaves as prescribed by source programs. However, compilers are complex programs that perform many subtle transformations. Bugs in compilers do happen and can lead to silently producing incorrect executable code from a correct source program. T...

We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be done without binders or alpha-conversion. We propose specific criteria for evaluating the utilit...

Programmers naturally expect that compilers and other code generation tools produce executable code that behaves as prescribed by source programs. However, compilers are complex programs that perform many subtle transformations. Bugs in compilers do happen and can lead to silently producing incorrect executable code from a correct source program. T...

The POPLmark challenge is a collective experiment intended to assess the usability of theorem provers and proof assistants in the context of fundamental research on programming languages. In this report, we present a solution to the challenge, developed with the Coq proof assistant, and using the "locally nameless" presentation of terms with binder...

The widespread adoption of free and open source software (FOSS) in many strategic contexts of the information technology society has drawn the attention on the issues regarding how to handle the complexity of assembling and managing a huge number of (packaged) components in a consistent and effective way. FOSS distributions (and in particular GNU/L...

This paper presents the formal verification of a compiler front-end that translates a subset of the C language into the Cminor intermediate language. The semantics of the source and target languages as well as the translation between them have been written in the specification language of the Coq proof assistant. The proof of observational semantic...

In the mainstream adoption of free and open source software (FOSS), distribution editors play a crucial role: they package, integrate and distribute a wide variety of software, written in a variety of languages, for a variety of purposes of unprecedented breadth. Ensuring the quality of a FOSS distribution is a technical and engineering challenge,...

Objects with dynamic types allow the integration of operations that essentially require run-time type-checking into statically-typed languages. This paper presents two extensions of the ML language with dynamics, based on what has been done in the CAML implementation of ML, and discusses their usefulness. The main novelty of this work is the combin...

This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a certified compiler is useful in the context of formal methods appli...

This paper illustrates the use of co-inductive definitions and proofs in big-step operational semantics, enabling the latter to describe diverging evaluations in addition to terminating evaluations. We show applications to proofs of type soundness and to proofs of semantic preservation for compilers.

This paper presents a formal verification with the Coq proof assistant of a memory model for C-like imperative languages. This model defines the memory layout and the operations that manage the memory. The model has been specified at two levels of abstraction and implemented as part of an ongoing certification in Coq of a moderately-optimising C co...

The open-source software community is now comprised of a very large and growing number of contributors and users. The GNU/Linux operating system for instance has an estimated 18 million users worldwide and its contributing developers can be counted by thousands. The critical mass of contributors taking part in various open-source projects has helpe...

This paper reports on the correctness proof of compiler optimizations based on data-flow analysis. We formulate the optimizations and analyses as instances of a general framework for data-flow analyses and transformations, and prove that the optimizations preserve the behavior of the compiled programs. This development is a part of a larger effort...

Mixin modules are a framework for modular programming that supports code parameterization, incremental programming via late binding and rede nitions, and cross-module recursion. In this paper, we develop a language of mixin modules that supports call-by-value evaluation, and formalize a reduction semantics and a sound type system for this language.

Mixin modules are a framework for modular programming that supports code parameterization, incremental programming via late binding and redefinitions, and cross-module recursion. In this paper, we develop a language of mixin modules that supports call-by-value evaluation, and formalize a reduction semantics and a sound type system for this language...

The paper addresses theoretical and practical aspects of implementing multi-stage languages using abstract syntax trees (ASTs), gensym, and reflection. We present an operational account of the correctness of this approach, and report on our experience with a bytecode compiler called MetaOCaml that is based on this strategy. Current performance meas...

Bytecode verification is a crucial security component for Java applets, on the Web and on embedded devices such as smart cards. This paper reviews the various bytecode verification algorithms that have been proposed, recasts them in a common framework of dataflow analysis, and surveys the use of proof assistants to specify bytecode verification and...

Module systems are important for software engineering: they facilitate code reuse without compromising the correctness of programs. However, they still lack some flexibility: first, they do not allow mutually recursive definitions to span module boundaries ; second, definitions inside modules are bound early, and cannot be overridden later, as oppo...

Computer security is usually defined as ensuring integrity, confidentiality, and availability requirements even in the presence of a determined, malicious opponent. Sensitive data must be modified and consulted by authorized users only (integrity, confidentiality); moreover, the system should resist “denial of service” attacks that attempt to rende...

This paper presents a program transformation that allows languages with polymorphic typing (e.g. ML) to be implemented with unboxed, multi-word data representations, more efficient than the conventional boxed representations. The transformation introduces coercions between various representations, based on a typing derivation. A prototype ML compil...

We present a new approach to the polymorphic typing of data accepting in-place modi cation in ML-like languages. This approach is based on restrictions over type generalization, and a re ned typing of functions. The type system given here leads to a better integration of imperative programming style with the purely applicative kernel of ML. In part...

Functional languages encourage the extensive use of recursive fonctions and data structures. It is therefore important that they efficiently implement recursion. In this paper, we formalize and improve a known implementation technique for recursion. The original technique was introduced by Cousineau and Mauny as the «in-place updating trick». Consi...

Motivated by applications to proof assistants based on dependent types, we develop and prove correct a strong reducer and ß-equivalence checker for the λ-calculus with products, sums, and guarded fixpoints. Our approach is based on compilation to the bytecode of an abstract machine performing weak reductions on non-closed terms, derived with minima...

We present a variant of the Standard ML module system where parameterized abstract types (i.e. functors returning generarive types) map provably equal arguments to compatible abstract types, instead of generating distinct types at each application as in Standard ML. This extension solves the full transparency problem (how to give syntactic signatur...

This paper presents a variant of the SML module system that introduces a strict distinction between abstract types and manifest types (types whose definitions are part of the module specification), while retaining most of the expressive power of the SML module system. The resulting module system provides much better support for separate compila- ti...

This article presents a novel approach to the problem of bytecode verification for Java Card applets. By relying on prior off-card bytecode transformations, we simplify the bytecode verifier and reduce its memory requirements to the point where it can be embedded on a smart card, thus increasing significantly the security of post-issuance downloadi...

The ML module system provides powerful parameterization facilities, but lacks the ability to split mutually recursive definitions
across modules, and does not provide enough facilities for incremental programming. A promising approach to solve these issues
is Ancona and Zucca’s mixin modules calculus CMS. However, the straightforward way to adapt i...

Inspired by the successes of program generation, partial evaluation, and runtime code generation, multi-stage languages were developed as a uniform, high-level, and principled view of staging. Our current goal is to demonstrate the utility of these languages in a practical implementation. As a rst step this paper presents MetaOCaml, a type-safe, mu...

Bytecode verification is a crucial security component for Java applets, on the Web and on embedded devices such as smart cards. This paper describes the main bytecode verification algorithms and surveys the variety of formal methods that have been applied to bytecode verification in order to establish its correctness.

This paper presents a novel approach to the problem of bytecode verification for Java Card applets. Owing to its low memory requirements, our verification algorithm is the first that can be embedded on a smart card, thus increasing tremendously the security of post-issuance downloading of applets on Java Cards.

on between integers numbers and floating-point numbers at compile-time. The main motivation for this separation, according to Fortran's designers, was to avoid the difficulties of handling mixed arithmetic at run-time [2, chapter 6]. Thanks to the type system, the compiler "knows" when to generate integer arithmetic operations, floating-point arith...

In a programming language with procedures and assignments, it is often important to isolate uses of state to particular program fragments. The frameworks of type, region, and effect inference, and monadic state are technologies that have been used to state and enforce the property that an expression has no visible side-effects. This property has be...

We describe the methodology and current features for ML2000, a new-generation design of ML. ML2000 adds a number of features to Standard ML and Caml, providing better support for extensibility and code reuse, while also fixing latent problems. Although none of the features is particularly novel on its own, the combination of features and design met...

This report presents a program analysis to estimate uncaught exceptions in ML programs. This analysis relies on unification-based type inference in a non-standard type system, using rows to approximate both the flow of escaping exceptions (a la effect systems) and the flow of result values (a la control-flow analyses). The resulting analysis is bot...

This report presents a program analysis to estimate uncaught exceptions in ML programs. This analysis relies on unification-based type inference in a non-standard type system, using rows to approximate both the flow of escaping exceptions (a la effect systems) and the flow of result values (a la control-flow analyses). The resulting analysis is eff...

Writing parallel programs is not easy, and debugging them is usually a nightmare. To cope with these difficulties, a structured approach to parallel programs using skeletons and template based compiler techniques has been developed over the past years by several researchers, including the P3L group in Pisa. This approach is based on the use of a se...

Writing parallel programs is not easy, and debugging them is usually a nightmare. To cope with these difficulties, a structured approach to parallel programs using skeletons and template based compiler techniques has been developed over the past years by several researchers, including the P3L group in Pisa. This approach is based on the use of a se...