Article

A conceptual framework for safe object initialization: a principled and mechanized soundness proof of the Celsius model

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An object under initialization does not fulfill its class specification yet and can be unsafe to use as it may have uninitialized fields. It can sometimes be useful to call methods on such a partially initialized object in order to compute a complex initial value, or to let the object escape its constructor in order to create mutually recursive objects. However, inadvertent usage of uninitialized fields can lead to run-time crashes. Those subtle programming errors are not statically detected by most modern compilers. While many other features of object-oriented programming languages have been thoroughly studied over the years, object initialization lacks a simple, systematic, and principled treatment. Building on the insights of previous work, we identify a set of four core principles for safe initialization: monotonicity, authority, stackability, and scopability. We capture the essence of the principles with a minimal calculus, Celsius, and show that the principles give rise to a practical initialization system that strikes a balance between expressiveness and simplicity. The meta-theory of the system is entirely mechanized using the Coq proof assistant. We believe that our approach based on well-identified core principles sheds new light on the underlying mechanisms ensuring safety and could serve as a basis for language design when faced with similar challenges.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
Dynamic object-oriented languages, such as Python, Ruby, and Javascript are widely used nowadays. A distinguishing feature of dynamic object-oriented languages is that objects, the fundamental runtime data representation, are highly dynamic, meaning that a single constructor may create objects with different types and objects can evolve freely after their construction. While such dynamism facilitates fast prototyping, it brings many challenges to program understanding. Many type systems have been developed to aid programming understanding, and they adopt various types and techniques to represent and track dynamic objects. However, although many types and techniques have been proposed, it is unclear which one suits real dynamic object usages best. Motivated by this situation, we perform an empirical study on 50 mature Python programs with a focus on object dynamism and object type models. We found that (1) object dynamism is highly prevalent in Python programs, (2) class-based types are not precise to handle dynamic behaviors, as they introduce type errors for 52% of the evaluated polymorphic attributes, (3) typestate-based types, although mostly used in static languages, matches the behaviors of dynamic objects faithfully, and (4) some well-designed but still lightweight techniques for object-based types, such as argument type separation and recency abstraction can precisely characterize dynamic object behaviors. Those techniques are suitable for building precise but still concise object-based types.
Article
Object-oriented programming has been bothered by an awkward feature for a long time: static members . Static members not only compromise the conceptual integrity of object-oriented programming, but also give rise to subtle initialization errors, such as reading non-initialized fields and deadlocks. The Scala programming language eliminated static members from the language, replacing them with global objects that present a unified object-oriented programming model. However, the problem of global object initialization remains open, and programmers still suffer from initialization errors. We propose partial ordering and initialization-time irrelevance as two fundamental principles for initializing global objects. Based on these principles, we put forward an effective static analysis to ensure safe initialization of global objects, which eliminates initialization errors at compile time. The analysis also enables static scheduling of global object initialization to avoid runtime overhead. The analysis is modular at the granularity of objects and it avoids whole-program analysis. To make the analysis explainable and tunable, we introduce the concept of regions to make context-sensitivity understandable and customizable by programmers.
Conference Paper
Full-text available
X10 is an object oriented programming language with a sophisticated type system (constraints, class invariants, non-erased generics, closures) and concurrency constructs (asynchronous activities, multiple places). Object initialization is a cross-cutting concern that interacts with all of these features in delicate ways that may cause type, runtime, and security errors. This paper discusses possible designs for object initialization, and the "hardhat" design chosen and implemented in X10 version 2.2. Our implementation includes a fixed-point inter-procedural (intra-class) data-flow analysis that infers, for each method called during initialization, the set of fields that are read, and those that are asynchronously and synchronously assigned. Our codebase of more than 200K lines of code only had 104 annotations. Finally, we formalize the essence of initialization checking with an effect system intended to complement a standard FJ style formalization of the type system for X10. This system is substantially simpler than the masked types of [10], and it is more practical (for X10) than the free-committed types of [12]. This is the first formalization of a type and (flow-sensitive) effect system for safe initialization in the presence of concurrency constructs.
Conference Paper
Full-text available
The paper defines the class of heap monotonic typestates. The monotonicity of such typestates enables sound checking algorithms without the need for non-aliasing regimes of pointers. The basic idea is that data structures evolve over time in a manner that only makes their representation invariants grow stronger, never weaker. This assumption guarantees that existing object references with particular typestates remain valid in all program futures, while still allowing objects to attain new stronger typestates. The system is powerful enough to establish properties of circular data structures.
Article
Full-text available
We present a new approach to programming languages for parallel computers that uses an effect system to discover expression scheduling constraints. This effect system is part of a 'kinded' type system with three base kinds: types, which describe the value that an expression may return; effects, which describe the side-effects that an expression may have; and regions, which describe the area of the store in which side-effects may occur. Types, effects and regions are collectively called descriptions. Expressions can be abstracted over any kind of description variable -- this permits type, effect and region polymorphism. Unobservable side-effects can be masked by the effect system; an effect soundness property guarantees that the effects computed statically by the effect system are a conservative approximation of the actual side-effects that a given expression may have. The effect system we describe performs certain kinds of side-effect analysis that were not previously feasible. Experimental data from the programming language FX indicate that an effect system can be used effectively to compile programs for parallel computers.
Article
Full-text available
Mutual dependencies between objects arise frequently in programs, and programmers must typically solve this value recursion by manually filling “initialization holes” to help construct the corresponding object graphs, i.e. null values and/or explicitly mutable locations. This paper aims to augment ongoing theoretical work on value recursion with a description of a semi-safe mechanism for a generalized form of value recursion in an ML-like language, where initialization corresponds to a graph of lazy computations whose nodes are sequentially forced, requiring runtime checks for soundness during initialization in the style of Russo. Our primary contribution is to use the mechanism to develop compelling examples of how the absence of value recursion leads to real problems in the presence of abstraction boundaries, and give micro-examples that characterize how initialization graphs permit more programs to be expressed in the mutation-free fragment of ML. Finally we argue that in heterogeneous programming environments semi-safe variations on value-recursion may be appropriate for ML-like languages, because initialization effects from external libraries are difficult to characterize, document and control.
Conference Paper
Full-text available
Distinguishing non-null references from possibly-null references at the type level can detect null-related errors in object-oriented programs at compile-time. This paper gives a proposal for retrofitting a language such as C# or Java with non-null types. It addresses the central complications that arise in constructors, where declared non-null fields may not yet have been initialized, but the partially constructed object is already accessible. The paper reports experience with an implementation for annotating and checking null-related properties in C# programs.
Article
Full-text available
We present a generalization of Java's parametric polymorphism that enables param- eterization of classes and methods by type constructors, i.e., functions from types to types. Our extension is formalized as a calculus called FGJ!. It is implemented in a prototype compiler and its type system is proven safe and decidable. We describe our extension and motivate its introduction in an object-oriented context through two examples: the definition of generic data-types with binary methods and the definition of generalized algebraic data-types. The Coq proof assistant was used to formalize FGJ! and to mechanically check its proof of type safety.
Article
Full-text available
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations over permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Article
Full-text available
Permission is hereby granted to make and distribute verbatim copies of this document without royalty or fee. Permission is granted to quote excerpts from this documented provided the original source is properly cited. ii When separately written programs are composed so that they may cooperate, they may instead destructively interfere in unanticipated ways. These hazards limit the scale and functionality of the software systems we can successfully compose. This dissertation presents a framework for enabling those interactions between components needed for the cooperation we intend, while minimizing the hazards of destructive interference. Great progress on the composition problem has been made within the object paradigm, chiefly in the context of sequential, single-machine programming among benign components. We show how to extend this success to support robust composition of concurrent and potentially malicious components distributed over potentially malicious machines. We present E, a distributed, persistent, secure programming language, and CapDesk, a virus-safe desktop built in E, as embodiments of the techniques we explain.
Article
Full-text available
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than previous methods. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the translation to be correct. Comment: 62 pages, uses pic
Article
In call-by-value languages, some mutually-recursive definitions can be safely evaluated to build recursive functions or cyclic data structures, but some definitions (let rec x = x + 1) contain vicious circles and their evaluation fails at runtime. We propose a new static analysis to check the absence of such runtime failures. We present a set of declarative inference rules, prove its soundness with respect to the reference source-level semantics of Nordlander, Carlsson, and Gill [2008], and show that it can be directed into an algorithmic backwards analysis check in a surprisingly simple way. Our implementation of this new check replaced the existing check used by the OCaml programming language, a fragile syntactic criterion which let several subtle bugs slip through as the language kept evolving. We document some issues that arise when advanced features of a real-world functional language (exceptions in first-class modules, GADTs, etc.) interact with safety checking for recursive definitions.
Article
The Dependent Object Types (DOT) calculus serves as a foundation of the Scala programming language, with a machine-verified soundness proof. However, Scala's type system has been shown to be unsound due to null references, which are used as default values of fields of objects before they have been initialized. This paper proposes ιDOT, an extension of DOT for ensuring safe initialization of objects. DOT was previously extended to κDOT with the addition of mutable fields and constructors. To κDOT, ιDOT adds an initialization effect system that statically prevents the possibility of reading a null reference from an uninitialized object. To design ιDOT, we have reformulated the Freedom Before Commitment object initialization scheme in terms of disjoint subheaps to make it easier to formalize in an effect system and prove sound. Soundness of ιDOT depends on the interplay of three systems of rules: a type system close to that of DOT, an effect system to ensure definite assignment of fields in each constructor, and an initialization system that tracks the initialization status of objects in a stack of subheaps. We have proven the overall system sound and verified the soundness proof using the Coq proof assistant.
Article
Every newly created object goes through several initialization states: starting from a state where all fields are uninitialized until all of them are assigned. Any operation on the object during its initialization process, which usually happens in the constructor via this , has to observe the initialization states of the object for correctness, i.e. only initialized fields may be used. Checking safe usage of this statically, without manual annotation of initialization states in the source code, is a challenge, due to aliasing and virtual method calls on this . Mainstream languages either do not check initialization errors, such as Java, C++, Scala, or they defend against them by not supporting useful initialization patterns, such as Swift. In parallel, past research has shown that safe initialization can be achieved for varying degrees of expressiveness but by sacrificing syntactic simplicity. We approach the problem by upholding local reasoning about initialization which avoids whole-program analysis, and we achieve typestate polymorphism via subtyping. On this basis, we put forward a novel type-and-effect system that can effectively ensure initialization safety while allowing flexible initialization patterns. We implement an initialization checker in the Scala 3 compiler and evaluate on several real-world projects.
Article
While type soundness proofs are taught in every graduate PL class, the gap between realistic languages and what is accessible to formal proofs is large. In the case of Scala, it has been shown that its formal model, the Dependent Object Types (DOT) calculus, cannot simultaneously support key metatheoretic properties such as environment narrowing and subtyping transitivity, which are usually required for a type soundness proof. Moreover, Scala and many other realistic languages lack a general substitution property. The first contribution of this paper is to demonstrate how type soundness proofs for advanced, polymorphic, type systems can be carried out with an operational semantics based on high-level, definitional interpreters, implemented in Coq. We present the first mechanized soundness proofs in this style for System F and several extensions, including mutable references. Our proofs use only straightforward induction, which is significant, as the combination of big-step semantics, mutable references, and polymorphism is commonly believed to require coinductive proof techniques. The second main contribution of this paper is to show how DOT-like calculi emerge from straightforward generalizations of the operational aspects of F, exposing a rich design space of calculi with path-dependent types inbetween System F and DOT, which we dub the System D Square. By working directly on the target language, definitional interpreters can focus the design space and expose the invariants that actually matter at runtime. Looking at such runtime invariants is an exciting new avenue for type system design.
Conference Paper
Scala’s type system unifies aspects of ML modules, object- oriented, and functional programming. The Dependent Object Types (DOT) family of calculi has been proposed as a new theoretic foundation for Scala and similar expressive languages. Unfortunately, type soundness has only been established for restricted subsets of DOT. In fact, it has been shown that important Scala features such as type refinement or a subtyping relation with lattice structure break at least one key metatheoretic property such as environment narrowing or invertible subtyping transitivity, which are usually required for a type soundness proof. The main contribution of this paper is to demonstrate how, perhaps surprisingly, even though these properties are lost in their full generality, a rich DOT calculus that includes recursive type refinement and a subtyping lattice with intersection types can still be proved sound. The key insight is that subtyping transitivity only needs to be invertible in code paths executed at runtime, with contexts consisting entirely of valid runtime objects, whereas inconsistent subtyping contexts can be permitted for code that is never executed.
Conference Paper
Programmers often need to initialise circular structures of objects. Initialisation should be safe (so that programs can never suffer null pointer exceptions or otherwise observe uninitialised values) and modular (so that each part of the circular structure can be written and compiled separately). Unfortunately, existing languages do not support modular circular initialisation: programmers in practical languages resort to Tony Hoare’s “Billion Dollar Mistake”: initialising variables with nulls, and then hoping to fix them up afterward. While recent research languages have offered some solutions, none fully support safe modular circular initialisation. We present placeholders, a straightforward extension to object-oriented languages that describes circular structures simply, directly, and modularly. In typed languages, placeholders can be described by placeholder types that ensure placeholders are used safely. We define an operational semantics for placeholders, a type system for placeholder types, and prove soundness. Incorporating placeholders into object-oriented languages should make programs simultaneously simpler to write, and easier to write correctly.
Conference Paper
One of the main purposes of object initialisation is to establish invariants such as a field being non-null or an immutable data structure containing specific values. These invariants are then implicitly assumed by the rest of the implementation, for instance, to ensure that a field may be safely dereferenced or that immutable data may be accessed concurrently. Consequently, letting an object escape from its constructor is dangerous; the escaping object might not yet satisfy its invariants, leading to errors in code that relies on them. Nevertheless, preventing objects entirely from escaping from their constructors is too restrictive; it is often useful to call auxiliary methods on the object under initialisation or to pass it to another constructor to set up mutually-recursive structures. We present a type system that tracks which objects are fully initialised and which are still under initialisation. The system can be used to prevent objects from escaping, but also to allow safe escaping by making explicit which objects might not yet satisfy their invariants. We designed, formalised and implemented our system as an extension to a non-null type system, but it is not limited to this application. Our system is conceptually simple and requires little annotation overhead; it is sound and sufficiently expressive for many common programming idioms. Therefore, we believe it to be the first such system suitable for mainstream use.
Article
We develop a mechanized proof of Featherweight Java with Assignment and Immutability in the Coq proof assistant. This is a step towards more machine-checked proofs of a non-trivial type system. We used object immutability close to that of IGJ [8]. We describe the challenges of the mechanization and the encoding we used inside of Coq. prover. Our Coq sources are publically available 1. Example. We define a parametrized class Cell, where the mutable instantiation can get and set the interned object, whereas the immutable instantiation can only get the interned object, provided initially in the constructor. We chose to use transitive mutability in this example. 1.
Conference Paper
Arrays are intensively used in many software programs, including those in the popular graphics and game programming domains. Al- though the problem of eliminating redundant array bound checks has been studied for a long time, there are few works that attempt to be both aggressively precise and practical. We propose an inference mechanism that achieves both aims by combining a forward rela- tional analysis with a backward precondition derivation. Our infer- ence algorithm works for a core imperative language with assign- ments, and analyses each method once through a summary-based approach. Our inference is precise as it is both path and context sen- sitive. Through a novel technique that can strengthen preconditions, we can selectively reduce the sizes of formulae to support a prac- tical inference algorithm. Moreover, we subject each inferred pro- gram to a flexivariant specialization that can achieve good tradeoff between elimination of array checks and code explosion concerns. We have proven the soundness of our approach and have also im- plemented a prototype inference and specialization system. Initial experiments suggest that such a desired system is viable.
Article
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations ever permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Igor Peshansky, and Vijay Saraswat. 2012. Object initialization in X10
  • Yoav Zibin
  • David Cunningham
  • Igor Peshansky
  • Vijay Saraswat
  • Zibin Yoav