Article

In Search of Types

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The concept of "type" has been used without a consistent, precise definition in discussions about programming languages for 60 years.1 In this essay I explore various concepts lurking behind distinct uses of this word, highlighting two traditions in which the word came into use largely independently: engineering traditions on the one hand, and those of symbolic logic on the other. These traditions are founded on differing attitudes to the nature and purpose of abstraction, but their distinct uses of "type" have never been explicitly unified. One result is that discourse across these traditions often finds itself at cross purposes, such as overapplying one sense of "type" where another is appropriate, and occasionally proceeding to draw wrong conclusions. I illustrate this with examples from well-known and justly well-regarded literature, and argue that ongoing developments in both the theory and practice of programming make now a good time to resolve these problems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... 142]. We tend to use the term "type" instead of "tag", as most programmers seem comfortable with this usage [33,48]. ...
Preprint
Full-text available
Technical computing is a challenging application area for programming languages to address. This is evinced by the unusually large number of specialized languages in the area (e.g. MATLAB, R), and the complexity of common software stacks, often involving multiple languages and custom code generators. We believe this is ultimately due to key characteristics of the domain: highly complex operators, a need for extensive code specialization for performance, and a desire for permissive high-level programming styles allowing productive experimentation. The Julia language attempts to provide a more effective structure for this kind of programming by allowing programmers to express complex polymorphic behaviors using dynamic multiple dispatch over parametric types. The forms of extension and reuse permitted by this paradigm have proven valuable for technical computing. We report on how this approach has allowed domain experts to express useful abstractions while simultaneously providing a natural path to better performance for high-level technical code.
... Static type systems are getting close to achieving the desired level of flexibility (as in Fortress [4] or Polyduce [28], for instance), but here we will use a dynamically typed approach. We are mostly interested in types in the sense of data types and partial information (for an excellent discussion of the many meanings of the word "type" see [68]). Additionally, some idioms in this domain appear to be inherently, or at least naturally, dynamically typed (as we will explore in chapter 2). ...
Thesis
Array-based programming environments are popular for scientific and technical computing. These systems consist of built-in function libraries paired with high-level languages for interaction. Although the libraries perform well, it is widely believed that scripting in these languages is necessarily slow, and that only heroic feats of engineering can at best partially ameliorate this problem. This thesis argues that what is really needed is a more coherent structure for this functionality. To find one, we must ask what technical computing is really about. This thesis suggests that this kind of programming is characterized by an emphasis on operator complexity and code specialization, and that a language can be designed to better fit these requirements. The key idea is to integrate code selection with code specialization, using generic functions and data-flow type inference. Systems like these can suffer from inefficient compilation, or from uncertainty about what to specialize on. We show that sufficiently powerful type-based dispatch addresses these problems. The resulting language, Julia, achieves a Quine-style "explication by elimination" of many of the productive features technical computing users expect.
Chapter
Complex systems are developed by teams of experts from multiple domains, who can be liberated from becoming programming experts through domain-specific languages (DSLs). The implementation of the different concerns of DSLs (including syntaxes and semantics) is now well established and supported by various language workbenches. However, the various services associated to a DSL (e.g., editors, model checkers, debuggers, or composition operators) are still directly based on its implementation. Moreover, while most of the services crosscut the different DSL concerns, they only require specific information on each. Consequently, this prevents the reuse of services among related DSLs and increases the complexity of service implementation. Leveraging the time-honored concept of interface in software engineering, we discuss the benefits of language interfaces in the context of software language engineering. In particular, we elaborate on particular usages that address current challenges in language development.
Conference Paper
Programming language research is becoming method conscious. Rigorous mathematical or empirical evaluation is often demanded, which is a good thing. However, I argue in this essay that concept analysis is a legitimate research approach in programming languages, with important limitations. It can be used to sharpen vague concepts, and to expose distinctions that have previously been overlooked, but it does not demonstrate the superiority of one language design over another. Arguments and counter-arguments are essential to successful concept analysis, and such thoughtful conversations should be published more.
Conference Paper
What is the definition of 'type'? Having a clear and precise answer to this question would avoid many misunderstandings and prevent meaningless discussions that arise from them. But having such clear and precise answer to this question would also hurt science, "hamper the growth of knowledge" and "deflect the course of investigation into narrow channels of things already understood". In this essay, I argue that not everything we work with needs to be precisely defined. There are many definitions used by different communities, but none of them applies universally. A brief excursion into philosophy of science shows that this is not just tolerable, but necessary for progress. Philosophy also suggests how we can think about this imprecise notion of type.
Conference Paper
Full-text available
In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called "On understanding types, data abstraction, and polymorphism". Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.
Conference Paper
Full-text available
The functional programming language ML has been undergoing a thorough redesign during the past year, and the module facility described here has been proposed as part of the revised language, now called Standard ML. The design has three main goals: (1) to facilitate the structuring of large ML programs; (2) to support separate compilation and generic library units; and (3) to employ new ideas in the semantics of data types to extend the power of ML's polymorphic type system. It is based on concepts inherent in the structure of ML, primarily the notions of a declaration, its type signature, and the environment that it denotes.
Conference Paper
Full-text available
Complete formal verification is the only known way to guarantee that a system is free of programming errors. We present our experience in performing the formal, machine-checked verification of the seL4 microkernel from an abstract specification down to its C implementation. We assume correctness of compiler, assembly code, and hardware, and we used a unique design approach that fuses formal and operating systems techniques. To our knowledge, this is the first formal proof of functional correctness of a complete, general-purpose operating-system kernel. Functional correct- ness means here that the implementation always strictly fol- lows our high-level abstract specification of kernel behaviour. This encompasses traditional design and implementation safety properties such as the kernel will never crash, and it will never perform an unsafe operation. It also proves much more: we can predict precisely how the kernel will behave in every possible situation. seL4, a third-generation microkernel of L4 provenance, comprises 8,700 lines of C code and 600 lines of assembler. Its performance is comparable to other high-performance L4 kernels.
Conference Paper
Full-text available
What is modularity? Which kind of modularity should developers strive for? Despite decades of research on modularity, these basic questions have no definite answer. We submit that the common understanding of modularity, and in particular its notion of information hiding, is deeply rooted in classical logic. We analyze how classical modularity, based on classical logic, fails to address the needs of developers of large software systems, and encourage researchers to explore alternative visions of modularity, based on nonclassical logics, and henceforth called nonclassical modularity.
Conference Paper
Full-text available
Traditionally, statically typed programming languages incorporate a built-in static type system. This system is typically mandatory: every legal program must successfully typecheck according to the rules of the type system. In con- trast, optional type systems are neither syntactically nor semantically required, and have no eect on the dynamic semantics of the language. This in turn en- ables pluggable type systems (Bra03, Bra04), allowing multiple type systems to be used simultaneously and/or sequentially for various semantic analyses. We argue that pluggable types can provide most of the advantages of mandatory type systems without most of the drawbacks.
Article
Full-text available
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations over permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Article
Full-text available
Logic has been long interested in whether answers to certain questions are computable in principle, since the outcome puts bounds on the possibilities of formalization. More recently, precise comparisons in the efficiency of decision methods have become available through the developments in complexity theory. These, however, are applications to logic, and a big question is whether methods of logic have significance in the other direction for the more applied parts of computability theory. Programming !anguages offer an obvious opportunity as their syntactic formalization is well advanced; however, the semantical theory can hardly be said to be complete. Though we have many examples, we have still to give wide-ranging mathematical answers to these queries: What is a machine? What is a computable process? How (or how well) does a machine simulate a process? Programs naturally enter in giving descriptions of processes. The definition of the precise meaning of a program then requires us to explain what are the objects of computation (in a way, the statics of the problem) and how they are to be transformed (the dynamics). So far the theories of automata and of nets, though most interesting for dynamics, have formalized only a portion of the field, and there has been perhaps too much concentration of the finite-state and algebraic aspects. It would seem that the understanding of higher-level program features involves us with infinite objects and forces us to pass through several levels of explanation to go from the conceptual ideas to the final simulation on a real machine. These levels can be made mathematically exact if we can find the right abstractions to represent the necessary structures. The experience of many independent workers with the method of data types as lattices (or partial orderings) under an information content ordering, and with their continuous mappings, has demonstrated the flexibility of this approach in providing definitions and proofs, which are clean and without undue dependence on implementations. Nevertheless much remains to be done in showing how abstract conceptualizations can (or cannot) be actualized before we can say we have a unified theory.
Article
Full-text available
This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time. The effectiveness of a "modularization" is dependent upon the criteria used in dividing the system into modules. A system design problem is presented and both a conventional and unconventional decomposition are described. It is shown that the unconventional decompositions have distinct advantages for the goals outlined. The criteria used in arriving at the decompositions are discussed. The unconventional decomposition, if implemented with the conventional assumption that a module consists of one or more subroutines, will be less efficient in most cases. An alternative approach to implementation which does not have this effect is sketched.
Article
Full-text available
Designing software to be extensible and easily contracted is discussed as a special case of design for change. A number of ways that extension and contraction problems manifest themselves in current software are explained. Four steps in the design of software that is more flexible are then discussed. The most critical step is the design of a software structure called the " uses" relation. Some criteria for design decisions are given and illustrated using a small example. It is shown that the identification of minimal subsets and minimal extensions can lead to software that can be tailored to the needs of a broad variety of users.
Article
Full-text available
The functional programming language ML has been undergoing a thorough redesign during the past year, and the module facility described here has been proposed as part of the revised language, now called Standard ML. The design has three main goals: (1) to facilitate the structuring of large ML programs; (2) to support separate compilation and generic library units; and (3) to employ new ideas in the semantics of data types to extend the power of ML's polymorphic type system. It is based on concepts inherent in the structure of ML, primarily the notions of a declaration, its type signature, and the environment that it denotes.
Book
This fast-moving tutorial introduces you to OCaml, an industrial-strength programming language designed for expressiveness, safety, and speed. Through the book's many examples, you'll quickly learn how OCaml stands out as a tool for writing fast, succinct, and readable systems code using functional programming. Real World OCaml takes you through the concepts of the language at a brisk pace, and then helps you explore the tools and techniques that make OCaml an effective and practical tool. You'll also delve deep into the details of the compiler toolchain and OCaml's simple and efficient runtime system. This second edition brings the book up to date with almost a decade of improvements in the OCaml language and ecosystem, with new chapters covering testing, GADTs, and platform tooling. This title is also available as open access on Cambridge Core, thanks to the support of Tarides. Their generous contribution will bring more people to OCaml.
Conference Paper
Most ideas come from previous ideas. The sixties, particularly in the ARPA community, gave rise to a host of notions about "human-computer symbiosis" through interactive time-shared computers, graphics screens, and pointing devices. Advanced computer languages were invented to simulate complex systems such as oil refineries and semi-intelligent behavior. The soon to follow paradigm shift of modern personal computing, overlapping window interfaces, and object-oriented design came from seeing the work of the sixties as something more than a "better old thing." That is, more than a better way: to do mainframe computing; for end-users to invoke functionality; to make data structures more abstract. Instead the promise of exponential growth in computing/$/volume demanded that the sixties be regarded as "almost a new thing" and to find out what the actual "new things" might be. For example, one would compute with a handheld "Dynabook" in a way that would not be possible on a shared main-frame; millions of potential users meant that the user interface would have to become a learning environment along the lines of Montessori and Bruner; and needs for large scope, reduction in complexity, and end-user literacy would require that data and control structures be done away with in favor of a more biological scheme of protected universal cells interacting only through messages that could mimic any desired behavior.Early Smalltalk was the first complete realization of these new points of view as parented by its many predecessors in hardware, language, and user interface design. It became the exemplar of the new computing, in part, because we were actually trying for a qualitative shift in belief structures---a new Kuhnian paradigm in the same spirit as the invention of the printing press---and thus took highly extreme positions that almost forced these new styles to be invented.
Article
The motivation behind the work in very-high-level languages is to ease the programming task by providing the programmer with a language containing primitives or abstractions suitable to his problem area. The programmer is then able to spend his effort in the right place; he concentrates on solving his problem, and the resulting program will be more reliable as a result. Clearly, this is a worthwhile goal. Unfortunately, it is very difficult for a designer to select in advance all the abstractions which the users of his language might need. If a language is to be used at all, it is likely to be used to solve problems which its designer did not envision, and for which the abstractions embedded in the language are not sufficient. This paper presents an approach which allows the set of built-in abstractions to be augmented when the need for a new data abstraction is discovered. This approach to the handling of abstraction is an outgrowth of work on designing a language for structured programming. Relevant aspects of this language are described, and examples of the use and definitions of abstractions are given.
Chapter
This chapter presents the details of COBOL. It is one of the oldest major programming languages. A COBOL program consists of two sections: (1) the configuration section, and (2) the input–output section. The configuration section includes only two standard paragraphs, one naming the computer on which the program is compiled, the source computer, and the second naming the computer on which the program will be run, the object computer. The input–output section relates the data files used by the program to the equipment available at a specific installation, and it defines the characteristics of those files to the system. There are five COBOL statements that perform arithmetic: add, subtract, multiply, divide, and compute. The procedure division is the division in which the programmer provides the executable statements that carry out the functions the program is to perform. It has no pre-assigned section or paragraph names, requiring only that the division name be given at the beginning of the coding, in the form: PROCEDURE DIVISION.
Chapter
I hope that some people see some connection between the two topics in the title. If not, anyway, such connections will be developed in the course of these talks. Furthermore, because of the use of tools involving reference and necessity in analytic philosophy today, our views on these topics really have wide-ranging implications for other problems in philosophy that traditionally might be thought far-removed, like arguments over the mind-body problem or the so-called ‘identity thesis’. Materialism, in this form, often now gets involved in very intricate ways in questions about what is necessary or contingent in identity of properties — questions like that. So, it is really very important to philosophers who may want to work in many domains to get clear about these concepts. Maybe I will say something about the mind-body problem in the course of these talks. I want to talk also at some point (I don’t know if I can get it in) about substances and natural kinds.
Article
This chapter discusses that relating constructive mathematics to computer programming seems to be beneficial. Among the benefits to be derived by constructive mathematics from its association with computer programming, one is that you see immediately why you cannot rely upon the law of excluded middle: its uninhibited use would lead to programs that one did not know how to execute. By choosing to program in a formal language for constructive mathematics, like the theory of types, one gets access to the conceptual apparatus of pure mathematics, neglecting those parts that depend critically on the law of excluded middle, whereas even the best high level programming languages so far designed are wholly inadequate as mathematical languages. The virtue of a machine code is that a program written in it can be directly read and executed by the machine. The distinction between low and high level programming languages is of course relative to the available hardware. It may well be possible to turn what is now regarded as a high level programming language into machine code by the invention of new hardware.
Conference Paper
Engineering often precedes science. Incommensurability is real.
Conference Paper
Boolean satisfiability (SAT) solvers have improved enormously in performance over the last 10---15 years and are today an indispensable tool for solving a wide range of computational problems. However, our understanding of what makes SAT instances hard or easy in practice is still quite limited. A recent line of research in proof complexity has studied theoretical complexity measures such as length, width, and space in resolution, which is a proof system closely related to state-of-the-art conflict-driven clause learning (CDCL) SAT solvers. Although it seems like a natural question whether these complexity measures could be relevant for understanding the practical hardness of SAT instances, to date there has been very limited research on such possible connections. This paper sets out on a systematic study of the interconnections between theoretical complexity and practical SAT solver performance. Our main focus is on space complexity in resolution, and we report results from extensive experiments aimed at understanding to what extent this measure is correlated with hardness in practice. Our conclusion from the empirical data is that the resolution space complexity of a formula would seem to be a more fine-grained indicator of whether the formula is hard or easy than the length or width needed in a resolution proof. On the theory side, we prove a separation of general and tree-like resolution space, where the latter has been proposed before as a measure of practical hardness, and also show connections between resolution space and backdoor sets.
Article
The importance of determining an analytical function to fit a set of experimental data has long since been recognized. Efforts have been made to find reliable methods of relating an analytical function to a set of coordinates, and at the same time, to ...
Article
We present a complete unification algorithm for logic with the language of all finite types. In addition, we show how this procedure can be used with a generalized resolution principle which does not require expressions to be pre-Skolemized. Both of these steps are essential for an effective and complete mechanization of higher order logic.
Article
The unix* shell is a command programming language that provides an interface to the unix operating system. It contains several mechanisms found in algorithmic languages such as control-flow primitives, variables, and parameter passing. Constructs such as while, if, for, and case are available. Two-way communication is possible between the shell and commands. String-valued parameters, typically file names or flags, may be passed to a command. A return code is set by commands and may be used to determine the flow of control, and the standard output from a command may be used as input to the shell. The shell can modify the environment in which commands run. Input and output can be redirected and processes that communicate through “pipes” can be invoked. Commands are found by searching directories in the file system in a sequence that can be defined by the user.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
The language BCPL (Basic CPL) was originally developed as a compiler writing tool and as its name suggests it is closely related to CPL (Combined Programming Language) which was jointly developed at Cambridge and London Universities. BCPL adopted much of the syntactic richness of CPL and strived for the same high standard of linguistic elegance; however, in order to achieve the efficiency necessary for system programming its scale and complexity is far less than that of CPL. The most significant simplification is that BCPL has only one data type---the binary bit pattern---and this feature alone gives BCPL a characteristic flavour which is very different of that of CPL and most other current programming languages.
Article
The aim of this work is largely a practical one. A widely employed style of programming, particularly in structure-processing languages which impose no discipline of types, entails defining procedures which work well on objects of a wide variety. We present a formal type discipline for such polymorphic procedures in the context of a simple programming language, and a compile time type-checking algorithm which enforces the discipline. A Semantic Soundness Theorem (based on a formal semantics for the language) states that well-type programs cannot “go wrong” and a Syntactic Soundness Theorem states that if accepts a program then it is well typed. We also discuss extending these results to richer languages; a type-checking algorithm based on is in fact already implemented and working, for the metalanguage ML in the Edinburgh LCF system.
Conference Paper
This paper presents a survey of issues which arise in contemporary High Order Languages in conjunction with the implementation of data types and type checking. Attention is paid to alternatives and tradeoffs in language features which realize various desirable goals regarding data types. Interactions between features are pointed out, and implementation techniques are discussed.
Conference Paper
The title is not a statement of fact, of course, but an opinion about how language designers should think about types. There has been a natural tendency to look to mathematics for a consistent, precise notion of what types are. The point of view there is extensional: a type is a subset of the universe of values. While this approach may have served its purpose quite adequately in mathematics, defining programming language types in this way ignores some vital ideas. Some interesting developments following the extensional approach are the ALGOL-68 type system [vW], Scott's theory [S], and Reynolds' system [R]. While each of these lend valuable insight to programming languages, I feel they miss an important aspect of types.Rather than worry about what types are I shall focus on the role of type checking. Type checking seems to serve two distinct purposes: authentication and secrecy. Both are useful when a programmer undertakes to implement a class of abstract objects to be used by many other programmers. He usually proceeds by choosing a representation for the objects in terms of other objects and then writes the required operations to manipulate them.
Article
In contrast to popular belief, proving termination is not always impossible.
Article
Abstract data type declarations appear in typed programming languages like Ada, Alphard, CLU and ML. This form of declaration binds a list of identifiers to a type with associated operations, a composite “value” we call a data algebra. We use a second-order typed lambda calculus SOL to show how data algebras may be given types, passed as parameters, and returned as results of function calls. In the process, we discuss the semantics of abstract data type declarations and review a connection between typed programming languages and constructive logic.
Article
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations ever permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Conference Paper
The motivation behind the work in very-high-level languages is to ease the programming task by providing the programmer with a language containing primitives or abstractions suitable to his problem area. The programmer is then able to spend his effort in the right place; he concentrates on solving his problem, and the resulting program will be more reliable as a result. Clearly, this is a worthwhile goal. Unfortunately, it is very difficult for a designer to select in advance all the abstractions which the users of his language might need. If a language is to be used at all, it is likely to be used to solve problems which its designer did not envision, and for which the abstractions embedded in the language are not sufficient. This paper presents an approach which allows the set of built-in abstractions to be augmented when the need for a new data abstraction is discovered. This approach to the handling of abstraction is an outgrowth of work on designing a language for structured programming. Relevant aspects of this language are described, and examples of the use and definitions of abstractions are given.
Article
Our objective is to understand the notion of type in programming languages, present a model of typed, polymorphic programming languages that reflects recent research in type theory, and examine the relevance of recent research to the design of practical programming languages. Object-oriented languages provide both a framework and a motivation for exploring the interaction among the concepts of type, data abstraction, and polymorphism, since they extend the notion of type to data abstraction and since type inheritance is an important form of polymorphism. We develop a λ-calculus-based model for type systems that allows us to explore these interactions in a simple setting, unencumbered by complexities of production programming languages. The evolution of languages from untyped universes to monomorphic and then polymorphic type systems is reviewed. Mechanisms for polymorphism such as overloading, coercion, subtyping, and parameterization are examined. A unifying framework for polymorphic type systems is developed in terms of the typed λ-calculus augmented to include binding of types by quantification as well as binding of values by abstraction. The typed λ-calculus is augmented by universal quantification to model generic functions with type parameters, existential quantification and packaging (information hiding) to model abstract data types, and bounded quantification to model subtypes and type inheritance. In this way we obtain a simple and precise characterization of a powerful type system that includes abstract data types, parametric polymorphism, and multiple inheritance in a single consistent framework. The mechanisms for type checking for the augmented λ-calculus are discussed. The augmented typed λ-calculus is used as a programming language for a variety of illustrative examples. We christen this language Fun because fun instead of λ is the functional abstraction keyword and because it is pleasant to deal with. Fun is mathematically simple and can serve as a basis for the design and implementation of real programming languages with type facilities that are more powerful and expressive than those of existing programming languages. In particular, it provides a basis for the design of strongly typed object-oriented languages.
Notes on data structuring, Structured programming
  • C A R Hoare
  • Ii Chapter
Abstract types defined as classes of variables, Proceedings of the 1976 conference on Data : Abstraction, definition and structure
  • D L Parnas
  • John E Shore
  • David Weiss
Towards a theory of type structure, Programming Symposium
  • C John
  • Reynolds
Preliminary report, specifications for the IBM mathematical FORmula TRANslating system, FORTRAN
  • J Backus
On a language proposal for the Department of Defense
  • E W Dijkstra